[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee0b3bb3-476b-4792-84e1-c53fa4dbabee@iogearbox.net>
Date: Mon, 20 Jan 2025 17:18:23 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Leo Yan <leo.yan@....com>, Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
<martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>,
Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, "David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Jesper Dangaard Brouer <hawk@...nel.org>,
James Clark <james.clark@...aro.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
Quentin Monnet <qmo@...nel.org>
Subject: Re: [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters
Hi Leo,
On 1/19/25 4:33 PM, Leo Yan wrote:
> Developers might need to profile a program with fine-grained
> granularity. E.g., a user case is to account the CPU cycles for a small
> program or for a specific function within the program.
>
> This commit introduces a small tool with using eBPF program to read the
> perf PMU counters for performance metrics. As the first step, the four
> counters are supported with the '-e' option: cycles, instructions,
> branches, branch-misses.
>
> The '-r' option is provided for support raw event number. This option
> is mutually exclusive to the '-e' option, users either pass a raw event
> number or a counter name.
>
> The tool enables the counters for the entire trace session in free-run
> mode. It reads the beginning values for counters when the profiled
> program is scheduled in, and calculate the interval when the task is
> scheduled out. The advantage of this approach is to dismiss the
> statistics noise (e.g. caused by the tool itself) as possible.
>
> The tool can support function based tracing. By using the '-f' option,
> users can specify the traced function. The eBPF program enables tracing
> at the function entry and disables trace upon exit from the function.
>
> The '-u' option can be specified for tracing user mode only.
>
> Below are several usage examples.
>
> Trace CPU cycles for the whole program:
>
> # ./trace_counter -e cycles -- /mnt/sort
> Or
> # ./trace_counter -e cycles /mnt/sort
> Create process for the workload.
> Enable the event cycles.
> Bubble sorting array of 3000 elements
> 551 ms
> Finished the workload.
> Event (cycles) statistics:
> +-----------+------------------+
> | CPU[0000] | 29093250 |
> +-----------+------------------+
> | CPU[0002] | 75672820 |
> +-----------+------------------+
> | CPU[0006] | 1067458735 |
> +-----------+------------------+
> Total : 1172224805
>
> Trace branches for the user mode only:
>
> # ./trace_counter -e branches -u -- /mnt/sort
> Create process for the workload.
> Enable the event branches.
> Bubble sorting array of 3000 elements
> 541 ms
> Finished the workload.
> Event (branches) statistics:
> +-----------+------------------+
> | CPU[0007] | 88112669 |
> +-----------+------------------+
> Total : 88112669
>
> Trace instructions for the 'bubble_sort' function:
>
> # ./trace_counter -f bubble_sort -e instructions -- /mnt/sort
> Create process for the workload.
> Enable the event instructions.
> Bubble sorting array of 3000 elements
> 541 ms
> Finished the workload.
> Event (instructions) statistics:
> +-----------+------------------+
> | CPU[0006] | 1169810201 |
> +-----------+------------------+
> Total : 1169810201
> Function (bubble_sort) duration statistics:
> Count : 5
> Minimum : 232009928
> Maximum : 236742006
> Average : 233962040
>
> Trace the raw event '0x5' (L1D_TLB_REFILL):
>
> # ./trace_counter -r 0x5 -u -- /mnt/sort
> Create process for the workload.
> Enable the raw event 0x5.
> Bubble sorting array of 3000 elements
> 540 ms
> Finished the workload.
> Event (0x5) statistics:
> +-----------+------------------+
> | CPU[0007] | 174 |
> +-----------+------------------+
> Total : 174
>
> Trace for the function and set CPU affinity for the profiled program:
>
> # ./trace_counter -f bubble_sort -x /mnt/sort -e cycles \
> -- taskset -c 2 /mnt/sort
> Create process for the workload.
> Enable the event cycles.
> Bubble sorting array of 3000 elements
> 619 ms
> Finished the workload.
> Event (cycles) statistics:
> +-----------+------------------+
> | CPU[0002] | 1169913056 |
> +-----------+------------------+
> Total : 1169913056
> Function (bubble_sort) duration statistics:
> Count : 5
> Minimum : 232054101
> Maximum : 236769623
> Average : 233982611
>
> The command above sets the CPU affinity with taskset command. The
> profiled function 'bubble_sort' is in the executable '/mnt/sort' but not
> in the taskset binary. The '-x' option is used to tell the tool the
> correct executable path.
>
> Signed-off-by: Leo Yan <leo.yan@....com>
> ---
> samples/bpf/Makefile | 7 +-
> samples/bpf/trace_counter.bpf.c | 222 +++++++++++++
> samples/bpf/trace_counter_user.c | 528 +++++++++++++++++++++++++++++++
> 3 files changed, 756 insertions(+), 1 deletion(-)
> create mode 100644 samples/bpf/trace_counter.bpf.c
> create mode 100644 samples/bpf/trace_counter_user.c
Thanks for this work! Few suggestions.. the contents of samples/bpf/ are in process of being
migrated into BPF selftests given they have been bit-rotting for quite some time, so we'd like
to migrate missing coverage into BPF CI (see test_progs in tools/testing/selftests/bpf/). That
could be one option, or an alternative is to extend bpftool for profiling BPF programs (see
47c09d6a9f67 ("bpftool: Introduce "prog profile" command")).
Powered by blists - more mailing lists