LINUX & HPC : Advanced Large Scale Computing at a Glance !: Understanding BPF Trace Probes and BTF: Practical Insights from Real-World Debugging

Introduction

BPFTrace is a powerful tool for dynamic tracing in Linux, enabling developers and system engineers to observe kernel and user-space events in real time. While working with BPFTrace, you often encounter different probe types and kernel features like BTF (BPF Type Format). This blog explains what these probes mean, why BTF matters, and how to troubleshoot common issues.

What bpftrace probes mean
- Explain probe types like tracepoint, rawtracepoint, kprobe, and fentry:
  - tracepoint: Stable kernel instrumentation points for syscalls and subsystems.
  - rawtracepoint: Low-level hooks for tracepoints with minimal decoding.
  - kprobe: Dynamic function entry probes for kernel symbols.
  - fentry: Modern BPF function entry probes using BTF type info.
What is BTF and why it matters
- BPF Type Format (BTF) provides kernel type metadata for BPF programs.
- Enables automatic argument decoding and advanced probes like fentry.
- How to check if BTF is present (/sys/kernel/btf/vmlinux) and what to do if missing (use BPFTRACE_KERNEL_SOURCE or simpler probes).
Common errors and fixes
- Example error: error: field has incomplete type 'const enum landlock_rule_type'
  - Cause: Incomplete type info due to missing or partial BTF.
  - Fix: Use raw syscalls tracepoints or point bpftrace to kernel sources.
Practical examples
- bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", comm); }'\ Meaning: Prints process names whenever openat() syscall is called.
- Alternatives for PPC/RHEL when BTF is incomplete: bpftrace -e 'tracepoint:rawsyscalls:sysenter { @[comm] = count(); }interval:s:5 { print(@); clear(@); }'
Tips for running tests and scripts

How to run bpftrace tests (ctest) and functional one-liners.
How to handle duration (interval probe or -c 'sleep N')

Background on BPF:

BPF (Berkeley Packet Filter) started as a packet filtering mechanism in Unix systems but has evolved into eBPF (Extended BPF) in modern Linux kernels. eBPF is a technology that allows you to run sandboxed programs inside the kernel without changing kernel source code or loading kernel modules.

Key idea: eBPF programs are verified and JIT-compiled by the kernel, making them safe and efficient.
Capabilities: Observability, networking, security, and performance monitoring.

What is bpftrace?

bpftrace is a high-level front-end for eBPF. It provides a simple scripting language to attach probes to kernel/user events and collect data. It’s similar to DTrace but for Linux.

Why is it needed?

Traditional monitoring tools often lack deep kernel visibility.
eBPF allows low-overhead, dynamic tracing without rebooting or patching the kernel.
Useful for:
- Performance analysis (CPU, I/O, latency)
- Debugging production issues
- Security auditing

When is it applied?

When you need real-time insights into kernel or application behavior.
Examples:
- Trace system calls (openat, read, write)
- Monitor network packets
- Profile application performance without intrusive instrumentation

Who can use this feature?

System administrators: For troubleshooting and performance tuning.
Kernel developers: For debugging kernel internals.
SRE/DevOps engineers: For observability in production.
Security teams: For detecting anomalies and enforcing policies.

-------------------------Examples of BPF trace command ------------

bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", comm); }'

That means --> bpftrace -e '...': Run an inline bpftrace program given in quotes.

tracepoint:syscalls:sys_enter_openat: Attach a probe to the kernel tracepoint that fires whenever a process calls the openat() system call (used to open files).

{ printf("%s\n", comm); }: The action block. For every event, print the process name (comm) that triggered the syscall.

Every time any process calls openat(), bpftrace prints the name of that process.

This is useful for observing which processes are opening files in real time. It leverages Linux tracepoints, which are stable kernel instrumentation points, and uses bpftrace’s built-in variable comm (the current process name).

# bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", comm); }'

Attached 1 probe

irqbalance

gssproxy

rmcd

bash

sshd

systemd

====>it means those processes invoked openat() during tracing.

--------------------------------

NOTE:

Built-in variable comm: In bpftrace, comm is automatically populated with the command name of the current task (the process executing when the probe fires).

Execution flow: The action block { printf("%s\n", comm); } runs for every event. At that instant, the kernel context is the process making the syscall, so comm reflects that process name.

=======Examples========

bpftrace -l '*sleep*'

list probes containing "sleep"

# bpftrace -l '*sleep*'

fentry:cls_flower:fl_destroy_sleepable

fentry:vmlinux:wq_worker_sleeping

fentry:vmlinux:zpool_can_sleep_mapped

kprobe:__bpf_prog_array_free_sleepable_cb

kprobe:__probestub_mm_compaction_kcompactd_sleep

kprobe:__probestub_mm_vmscan_kswapd_sleep

rawtracepoint:sunrpc:rpc_task_sleep

rawtracepoint:sunrpc:rpc_task_sync_sleep

tracepoint:syscalls:sys_exit_clock_nanosleep

tracepoint:syscalls:sys_exit_nanosleep

tracepoint:vmscan:mm_vmscan_kswapd_sleep

======

bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'

trace processes calling sleep

# bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'

Attached 1 probe

PID 846 sleeping...

===============

bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

count syscalls by process name

# bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

Attached 1 probe

@[gssproxy]: 4

@[gmain]: 10

@[IBM.MgmtDomainR]: 10

@[auditd]: 17

@[systemd-userwor]: 27

@[rmcd]: 40

@[in:imjournal]: 48

@[irqbalance]: 56

@[bash]: 67

@[multipathd]: 134

@[bpftrace]: 223

@[vi]: 564

@[sshd-session]: 2542

=============================

Conclusion

Understanding probe types and BTF is essential for effective bpftrace usage. When BTF is missing or incomplete, fallback strategies like raw tracepoints or kernel source paths ensure smooth tracing. These insights help troubleshoot errors and write efficient tracing scripts.

LINUX & HPC : Advanced Large Scale Computing at a Glance !

Friday, January 2, 2026

Understanding BPF Trace Probes and BTF: Practical Insights from Real-World Debugging

What is bpftrace?

Why is it needed?

When is it applied?

Who can use this feature?

No comments:

Post a Comment

Popular Posts

Translate