[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fUX78z+i5sKcZVAV1prWhA0wzNSK2zkzg70ERaLVDLFAQ@mail.gmail.com>
Date: Thu, 17 Aug 2023 08:44:03 -0700
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Namhyung Kim <namhyung@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Andrii Nakryiko <andrii@...nel.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Athira Jajeev <atrajeev@...ux.vnet.ibm.com>,
bpf@...r.kernel.org, Brendan Gregg <brendan.d.gregg@...il.com>,
Carsten Haitzler <carsten.haitzler@....com>,
Eduard Zingerman <eddyz87@...il.com>,
Fangrui Song <maskray@...gle.com>,
He Kuang <hekuang@...wei.com>, Ingo Molnar <mingo@...hat.com>,
James Clark <james.clark@....com>,
Jiri Olsa <jolsa@...nel.org>,
Kan Liang <kan.liang@...ux.intel.com>,
Leo Yan <leo.yan@...aro.org>, llvm@...ts.linux.dev,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Mark Rutland <mark.rutland@....com>,
Nathan Chancellor <nathan@...nel.org>,
"Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Ravi Bangoria <ravi.bangoria@....com>,
Rob Herring <robh@...nel.org>,
Tiezhu Yang <yangtiezhu@...ngson.cn>,
Tom Rix <trix@...hat.com>, Wang Nan <wangnan0@...wei.com>,
Wang ShaoBo <bobo.shaobowang@...wei.com>,
Yang Jihong <yangjihong1@...wei.com>,
Yonghong Song <yhs@...com>, YueHaibing <yuehaibing@...wei.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/1] perf trace: Use the augmented_raw_syscall BPF skel
only for tracing syscalls
On Thu, Aug 17, 2023 at 8:37 AM Arnaldo Carvalho de Melo
<acme@...nel.org> wrote:
>
> It is possible to use 'perf trace' with tracepoints and in that case we
> can't initialize/use the augmented_raw_syscalls BPF skel.
>
> For instance, this usecase:
>
> # perf trace -e sched:*exec --max-events=5
> ? ( ): NetworkManager/1183 ... [continued]: poll()) = 1
> 0.043 ( 0.007 ms): NetworkManager/1183 epoll_wait(epfd: 17<anon_inode:[eventpoll]>, events: 0x55555f90e920, maxevents: 6) = 0
> 0.060 ( 0.007 ms): NetworkManager/1183 write(fd: 3<anon_inode:[eventfd]>, buf: 0x7ffc5a27cd30, count: 8) = 8
> 0.073 ( 0.005 ms): NetworkManager/1183 epoll_wait(epfd: 24<anon_inode:[eventpoll]>, events: 0x7ffc5a27cd20, maxevents: 2) = 1
> 0.082 ( 0.010 ms): NetworkManager/1183 recvmmsg(fd: 26<socket:[30298]>, mmsg: 0x7ffc5a27caa0, vlen: 8) = 1
> #
>
> Where we want to trace just some sched tracepoints ending in 'exec' ends
> up tracing all syscalls.
>
> Fix it by checking existing trace->trace_syscalls boolean to see if we
> need the augmenter.
>
> A followup patch will move those sections of code used only with the
> augmenter to separate functions, to get it cleaner and remove the goto,
> done just for reviewing purposes.
>
> With this patch in place the previous behaviour is restored: no syscalls
> when we have other events and no syscall names:
>
> [root@...co ~]# perf probe do_filp_open "filename=pathname->name:string"
> Added new event:
> probe:do_filp_open (on do_filp_open with filename=pathname->name:string)
>
> You can now use it in all perf tools, such as:
>
> perf record -e probe:do_filp_open -aR sleep 1
>
> [root@...co ~]# perf trace --max-events=10 -e probe:do_filp_open sleep 1
> 0.000 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
> 0.056 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
> 0.481 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
> 0.501 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")
> 0.572 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION")
> 0.581 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION")
> 0.616 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib64/gconv/gconv-modules.cache")
> 0.656 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT")
> 0.664 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_MEASUREMENT")
> 0.696 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE")
> [root@...co ~]#
>
> As well as mixing syscalls with tracepoints, getting the syscall
> tracepoints used augmented using the BPF skel:
>
> [root@...co ~]# perf trace --max-events=10 -e open*,probe:do_filp_open sleep 1
> 0.000 ( ): sleep/455124 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ...
> 0.005 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
> 0.000 ( 0.011 ms): sleep/455124 ... [continued]: openat()) = 3
> 0.031 ( ): sleep/455124 openat(dfd: CWD, filename: "/lib64/libc.so.6", flags: RDONLY|CLOEXEC) ...
> 0.033 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
> 0.031 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = 3
> 0.258 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/lib/locale/locale-archive", flags: RDONLY|CLOEXEC) ...
> 0.261 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
> 0.258 ( 0.006 ms): sleep/455124 ... [continued]: openat()) = -1 ENOENT (No such file or directory)
> 0.272 ( ): sleep/455124 openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) ...
> 0.273 ( ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")
>
> A final note: the probe:do_filp_open uses a kprobe (probably optimized
> as its in the start of a function) that uses the kprobe_tracer mechanism
> in the kernel to collect the pathname->name string and stash it into the
> tracepoint created by 'perf probe' for that:
>
> [root@...co ~]# cat /sys/kernel/debug/tracing/kprobe_events
> p:probe/do_filp_open _text+4621920 filename=+0(+0(%si)):string
> [root@...co ~]#
>
> While the syscalls:sys_enter_openat tracepoint gets its string from a
> BPF program attached to raw_syscalls:sys_enter that tail calls into
> another BPF program that knows the types for the openat syscall args and
> thus can bpf_probe_read it right after the normal
> sys_enter/sys_enter_openat tracepoint payload that comes prefixed with
> whatever perf_event_open asked for (CPU, timestamp, etc):
>
> [root@...co ~]# bpftool prog | grep -E "sys_enter |sys_enter_opena" -A3
> 3176: tracepoint name sys_enter tag 0bc3fc9d11754ba1 gpl
> loaded_at 2023-08-17T12:32:20-0300 uid 0
> xlated 272B jited 257B memlock 4096B map_ids 2462,2466,2463
> btf_id 2976
> --
> 3180: tracepoint name sys_enter_opena tag 19dd077f00ec2f58 gpl
> loaded_at 2023-08-17T12:32:20-0300 uid 0
> xlated 328B jited 206B memlock 4096B map_ids 2466,2465
> btf_id 2976
> [root@...co ~]#
>
> Fixes: 42963c8bedeb864b ("perf trace: Migrate BPF augmentation to use a skeleton")
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> Cc: Andi Kleen <ak@...ux.intel.com>
> Cc: Andrii Nakryiko <andrii@...nel.org>
> Cc: Anshuman Khandual <anshuman.khandual@....com>
> Cc: Athira Jajeev <atrajeev@...ux.vnet.ibm.com>
> Cc: bpf@...r.kernel.org
> Cc: Brendan Gregg <brendan.d.gregg@...il.com>
> Cc: Carsten Haitzler <carsten.haitzler@....com>
> Cc: Eduard Zingerman <eddyz87@...il.com>
> Cc: Fangrui Song <maskray@...gle.com>
> Cc: He Kuang <hekuang@...wei.com>
> Cc: Ian Rogers <irogers@...gle.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: James Clark <james.clark@....com>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Kan Liang <kan.liang@...ux.intel.com>
> Cc: Leo Yan <leo.yan@...aro.org>
> Cc: llvm@...ts.linux.dev
> Cc: Madhavan Srinivasan <maddy@...ux.ibm.com>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: Nathan Chancellor <nathan@...nel.org>
> Cc: Naveen N. Rao <naveen.n.rao@...ux.vnet.ibm.com>
> Cc: Nick Desaulniers <ndesaulniers@...gle.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Ravi Bangoria <ravi.bangoria@....com>
> Cc: Rob Herring <robh@...nel.org>
> Cc: Tiezhu Yang <yangtiezhu@...ngson.cn>
> Cc: Tom Rix <trix@...hat.com>
> Cc: Wang Nan <wangnan0@...wei.com>
> Cc: Wang ShaoBo <bobo.shaobowang@...wei.com>
> Cc: Yang Jihong <yangjihong1@...wei.com>
> Cc: Yonghong Song <yhs@...com>
> Cc: YueHaibing <yuehaibing@...wei.com>
> Link: https://lore.kernel.org/lkml/
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>
Reviewed-by: Ian Rogers <irogers@...gle.com>
Thanks,
Ian
> ---
> tools/perf/builtin-trace.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 0ebfa95895e0bf4d..3964cf44cdbcb3e8 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -3895,7 +3895,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
> if (err < 0)
> goto out_error_open;
> #ifdef HAVE_BPF_SKEL
> - {
> + if (trace->syscalls.events.bpf_output) {
> struct perf_cpu cpu;
>
> /*
> @@ -3916,7 +3916,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
> goto out_error_mem;
>
> #ifdef HAVE_BPF_SKEL
> - if (trace->skel->progs.sys_enter)
> + if (trace->skel && trace->skel->progs.sys_enter)
> trace__init_syscalls_bpf_prog_array_maps(trace);
> #endif
>
> @@ -4850,6 +4850,9 @@ int cmd_trace(int argc, const char **argv)
> }
>
> #ifdef HAVE_BPF_SKEL
> + if (!trace.trace_syscalls)
> + goto skip_augmentation;
> +
> trace.skel = augmented_raw_syscalls_bpf__open();
> if (!trace.skel) {
> pr_debug("Failed to open augmented syscalls BPF skeleton");
> @@ -4884,6 +4887,7 @@ int cmd_trace(int argc, const char **argv)
> }
> trace.syscalls.events.bpf_output = evlist__last(trace.evlist);
> assert(!strcmp(evsel__name(trace.syscalls.events.bpf_output), "__augmented_syscalls__"));
> +skip_augmentation:
> #endif
> err = -1;
>
> --
> 2.41.0
>
Powered by blists - more mailing lists