Friday, January 2, 2026

Understanding BPF Trace Probes and BTF: Practical Insights from Real-World Debugging

Introduction 

BPFTrace is a powerful tool for dynamic tracing in Linux, enabling developers and system engineers to observe kernel and user-space events in real time. While working with BPFTrace, you often encounter different probe types and kernel features like BTF (BPF Type Format). This blog explains what these probes mean, why BTF matters, and how to troubleshoot common issues. 

  1. What bpftrace probes mean

    • Explain probe types like tracepointrawtracepointkprobe, and fentry:
      • tracepoint: Stable kernel instrumentation points for syscalls and subsystems.
      • rawtracepoint: Low-level hooks for tracepoints with minimal decoding.
      • kprobe: Dynamic function entry probes for kernel symbols.
      • fentry: Modern BPF function entry probes using BTF type info.
  2. What is BTF and why it matters

    • BPF Type Format (BTF) provides kernel type metadata for BPF programs.
    • Enables automatic argument decoding and advanced probes like fentry.
    • How to check if BTF is present (/sys/kernel/btf/vmlinux) and what to do if missing (use BPFTRACE_KERNEL_SOURCE or simpler probes).
  3. Common errors and fixes

    • Example error: error: field has incomplete type 'const enum landlock_rule_type'
      • Cause: Incomplete type info due to missing or partial BTF.
      • Fix: Use raw syscalls tracepoints or point bpftrace to kernel sources.
  4. Practical examples

    • bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", comm); }'\ Meaning: Prints process names whenever openat() syscall is called.
    • Alternatives for PPC/RHEL when BTF is incomplete: bpftrace -e 'tracepoint:rawsyscalls:sysenter { @[comm] = count(); }interval:s:5 { print(@); clear(@); }'
  5. Tips for running tests and scripts

    • How to run bpftrace tests (ctest) and functional one-liners.
    • How to handle duration (interval probe or -c 'sleep N')


Background on BPF: 

BPF (Berkeley Packet Filter) started as a packet filtering mechanism in Unix systems but has evolved into eBPF (Extended BPF) in modern Linux kernels. eBPF is a technology that allows you to run sandboxed programs inside the kernel without changing kernel source code or loading kernel modules.

  • Key idea: eBPF programs are verified and JIT-compiled by the kernel, making them safe and efficient.
  • Capabilities: Observability, networking, security, and performance monitoring.

What is bpftrace?

bpftrace is a high-level front-end for eBPF. It provides a simple scripting language to attach probes to kernel/user events and collect data. It’s similar to DTrace but for Linux.

Why is it needed?

  • Traditional monitoring tools often lack deep kernel visibility.
  • eBPF allows low-overhead, dynamic tracing without rebooting or patching the kernel.
  • Useful for:
    • Performance analysis (CPU, I/O, latency)
    • Debugging production issues
    • Security auditing

When is it applied?

  • When you need real-time insights into kernel or application behavior.
  • Examples:
    • Trace system calls (openatreadwrite)
    • Monitor network packets
    • Profile application performance without intrusive instrumentation

Who can use this feature?

  • System administrators: For troubleshooting and performance tuning.
  • Kernel developers: For debugging kernel internals.
  • SRE/DevOps engineers: For observability in production.
  • Security teams: For detecting anomalies and enforcing policies.
-------------------------Examples of BPF trace command ------------
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", comm); }'


That means -->    bpftrace -e '...': Run an inline bpftrace program given in quotes.
tracepoint:syscalls:sys_enter_openat: Attach a probe to the kernel tracepoint that fires whenever a process calls the openat() system call (used to open files).
{ printf("%s\n", comm); }: The action block. For every event, print the process name (comm) that triggered the syscall.

Every time any process calls openat(), bpftrace prints the name of that process.
This is useful for observing which processes are opening files in real time. It leverages Linux tracepoints, which are stable kernel instrumentation points, and uses bpftrace’s built-in variable comm (the current process name).

# bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", comm); }'
Attached 1 probe
irqbalance
gssproxy
rmcd
bash
sshd
systemd
====>it means those processes invoked openat() during tracing.
--------------------------------
NOTE:
  • Built-in variable comm: In bpftrace, comm is automatically populated with the command name of the current task (the process executing when the probe fires).
  • Execution flow: The action block { printf("%s\n", comm); } runs for every event. At that instant, the kernel context is the process making the syscall, so comm reflects that process name.
  • =======Examples========

    bpftrace -l '*sleep*'
        list probes containing "sleep"

    # bpftrace -l '*sleep*'
    fentry:cls_flower:fl_destroy_sleepable
    fentry:vmlinux:wq_worker_sleeping
    fentry:vmlinux:zpool_can_sleep_mapped
    kprobe:__bpf_prog_array_free_sleepable_cb
    kprobe:__probestub_mm_compaction_kcompactd_sleep
    kprobe:__probestub_mm_vmscan_kswapd_sleep
    rawtracepoint:sunrpc:rpc_task_sleep
    rawtracepoint:sunrpc:rpc_task_sync_sleep
    tracepoint:syscalls:sys_exit_clock_nanosleep
    tracepoint:syscalls:sys_exit_nanosleep
    tracepoint:vmscan:mm_vmscan_kswapd_sleep
    ======
    bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'
        trace processes calling sleep

    # bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'
    Attached 1 probe
    PID 846 sleeping...


    ===============
    bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
        count syscalls by process name
    # bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
    Attached 1 probe
    ^C

    @[gssproxy]: 4
    @[gmain]: 10
    @[IBM.MgmtDomainR]: 10
    @[auditd]: 17
    @[systemd-userwor]: 27
    @[rmcd]: 40
    @[in:imjournal]: 48
    @[irqbalance]: 56
    @[bash]: 67
    @[multipathd]: 134
    @[bpftrace]: 223
    @[vi]: 564
    @[sshd-session]: 2542

    =============================

    Conclusion 

    Understanding probe types and BTF is essential for effective bpftrace usage. When BTF is missing or incomplete, fallback strategies like raw tracepoints or kernel source paths ensure smooth tracing. These insights help troubleshoot errors and write efficient tracing scripts. 

    No comments:

    Post a Comment