lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 17 Aug 2023 08:44:03 -0700
From:   Ian Rogers <irogers@...gle.com>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Namhyung Kim <namhyung@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Athira Jajeev <atrajeev@...ux.vnet.ibm.com>,
        bpf@...r.kernel.org, Brendan Gregg <brendan.d.gregg@...il.com>,
        Carsten Haitzler <carsten.haitzler@....com>,
        Eduard Zingerman <eddyz87@...il.com>,
        Fangrui Song <maskray@...gle.com>,
        He Kuang <hekuang@...wei.com>, Ingo Molnar <mingo@...hat.com>,
        James Clark <james.clark@....com>,
        Jiri Olsa <jolsa@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Leo Yan <leo.yan@...aro.org>, llvm@...ts.linux.dev,
        Madhavan Srinivasan <maddy@...ux.ibm.com>,
        Mark Rutland <mark.rutland@....com>,
        Nathan Chancellor <nathan@...nel.org>,
        "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ravi Bangoria <ravi.bangoria@....com>,
        Rob Herring <robh@...nel.org>,
        Tiezhu Yang <yangtiezhu@...ngson.cn>,
        Tom Rix <trix@...hat.com>, Wang Nan <wangnan0@...wei.com>,
        Wang ShaoBo <bobo.shaobowang@...wei.com>,
        Yang Jihong <yangjihong1@...wei.com>,
        Yonghong Song <yhs@...com>, YueHaibing <yuehaibing@...wei.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/1] perf trace: Use the augmented_raw_syscall BPF skel
 only for tracing syscalls

On Thu, Aug 17, 2023 at 8:37 AM Arnaldo Carvalho de Melo
<acme@...nel.org> wrote:
>
> It is possible to use 'perf trace' with tracepoints and in that case we
> can't initialize/use the augmented_raw_syscalls BPF skel.
>
> For instance, this usecase:
>
>   # perf trace -e sched:*exec --max-events=5
>          ? (         ): NetworkManager/1183  ... [continued]: poll())                                             = 1
>      0.043 ( 0.007 ms): NetworkManager/1183 epoll_wait(epfd: 17<anon_inode:[eventpoll]>, events: 0x55555f90e920, maxevents: 6) = 0
>      0.060 ( 0.007 ms): NetworkManager/1183 write(fd: 3<anon_inode:[eventfd]>, buf: 0x7ffc5a27cd30, count: 8)     = 8
>      0.073 ( 0.005 ms): NetworkManager/1183 epoll_wait(epfd: 24<anon_inode:[eventpoll]>, events: 0x7ffc5a27cd20, maxevents: 2) = 1
>      0.082 ( 0.010 ms): NetworkManager/1183 recvmmsg(fd: 26<socket:[30298]>, mmsg: 0x7ffc5a27caa0, vlen: 8)       = 1
>   #
>
> Where we want to trace just some sched tracepoints ending in 'exec' ends
> up tracing all syscalls.
>
> Fix it by checking existing trace->trace_syscalls boolean to see if we
> need the augmenter.
>
> A followup patch will move those sections of code used only with the
> augmenter to separate functions, to get it cleaner and remove the goto,
> done just for reviewing purposes.
>
> With this patch in place the previous behaviour is restored: no syscalls
> when we have other events and no syscall names:
>
>   [root@...co ~]# perf probe do_filp_open "filename=pathname->name:string"
>   Added new event:
>     probe:do_filp_open   (on do_filp_open with filename=pathname->name:string)
>
>   You can now use it in all perf tools, such as:
>
>           perf record -e probe:do_filp_open -aR sleep 1
>
>   [root@...co ~]# perf trace --max-events=10 -e probe:do_filp_open sleep 1
>      0.000 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
>      0.056 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
>      0.481 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
>      0.501 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")
>      0.572 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION")
>      0.581 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION")
>      0.616 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib64/gconv/gconv-modules.cache")
>      0.656 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT")
>      0.664 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.utf8/LC_MEASUREMENT")
>      0.696 sleep/455122 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE")
>   [root@...co ~]#
>
> As well as mixing syscalls with tracepoints, getting the syscall
> tracepoints used augmented using the BPF skel:
>
>   [root@...co ~]# perf trace --max-events=10 -e open*,probe:do_filp_open sleep 1
>      0.000 (         ): sleep/455124 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ...
>      0.005 (         ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/etc/ld.so.cache")
>      0.000 ( 0.011 ms): sleep/455124  ... [continued]: openat())                                           = 3
>      0.031 (         ): sleep/455124 openat(dfd: CWD, filename: "/lib64/libc.so.6", flags: RDONLY|CLOEXEC) ...
>      0.033 (         ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/lib64/libc.so.6")
>      0.031 ( 0.006 ms): sleep/455124  ... [continued]: openat())                                           = 3
>      0.258 (         ): sleep/455124 openat(dfd: CWD, filename: "/usr/lib/locale/locale-archive", flags: RDONLY|CLOEXEC) ...
>      0.261 (         ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/lib/locale/locale-archive")
>      0.258 ( 0.006 ms): sleep/455124  ... [continued]: openat())                                           = -1 ENOENT (No such file or directory)
>      0.272 (         ): sleep/455124 openat(dfd: CWD, filename: "/usr/share/locale/locale.alias", flags: RDONLY|CLOEXEC) ...
>      0.273  (        ): sleep/455124 probe:do_filp_open(__probe_ip: -1186560412, filename: "/usr/share/locale/locale.alias")
>
> A final note: the probe:do_filp_open uses a kprobe (probably optimized
> as its in the start of a function) that uses the kprobe_tracer mechanism
> in the kernel to collect the pathname->name string and stash it into the
> tracepoint created by 'perf probe' for that:
>
>   [root@...co ~]# cat /sys/kernel/debug/tracing/kprobe_events
>   p:probe/do_filp_open _text+4621920 filename=+0(+0(%si)):string
>   [root@...co ~]#
>
> While the syscalls:sys_enter_openat tracepoint gets its string from a
> BPF program attached to raw_syscalls:sys_enter that tail calls into
> another BPF program that knows the types for the openat syscall args and
> thus can bpf_probe_read it right after the normal
> sys_enter/sys_enter_openat tracepoint payload that comes prefixed with
> whatever perf_event_open asked for (CPU, timestamp, etc):
>
>   [root@...co ~]# bpftool prog | grep -E "sys_enter |sys_enter_opena" -A3
>   3176: tracepoint  name sys_enter  tag 0bc3fc9d11754ba1  gpl
>         loaded_at 2023-08-17T12:32:20-0300  uid 0
>         xlated 272B  jited 257B  memlock 4096B  map_ids 2462,2466,2463
>         btf_id 2976
>   --
>   3180: tracepoint  name sys_enter_opena  tag 19dd077f00ec2f58  gpl
>           loaded_at 2023-08-17T12:32:20-0300  uid 0
>           xlated 328B  jited 206B  memlock 4096B  map_ids 2466,2465
>           btf_id 2976
>   [root@...co ~]#
>
> Fixes: 42963c8bedeb864b ("perf trace: Migrate BPF augmentation to use a skeleton")
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> Cc: Andi Kleen <ak@...ux.intel.com>
> Cc: Andrii Nakryiko <andrii@...nel.org>
> Cc: Anshuman Khandual <anshuman.khandual@....com>
> Cc: Athira Jajeev <atrajeev@...ux.vnet.ibm.com>
> Cc: bpf@...r.kernel.org
> Cc: Brendan Gregg <brendan.d.gregg@...il.com>
> Cc: Carsten Haitzler <carsten.haitzler@....com>
> Cc: Eduard Zingerman <eddyz87@...il.com>
> Cc: Fangrui Song <maskray@...gle.com>
> Cc: He Kuang <hekuang@...wei.com>
> Cc: Ian Rogers <irogers@...gle.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: James Clark <james.clark@....com>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Kan Liang <kan.liang@...ux.intel.com>
> Cc: Leo Yan <leo.yan@...aro.org>
> Cc: llvm@...ts.linux.dev
> Cc: Madhavan Srinivasan <maddy@...ux.ibm.com>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: Nathan Chancellor <nathan@...nel.org>
> Cc: Naveen N. Rao <naveen.n.rao@...ux.vnet.ibm.com>
> Cc: Nick Desaulniers <ndesaulniers@...gle.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Ravi Bangoria <ravi.bangoria@....com>
> Cc: Rob Herring <robh@...nel.org>
> Cc: Tiezhu Yang <yangtiezhu@...ngson.cn>
> Cc: Tom Rix <trix@...hat.com>
> Cc: Wang Nan <wangnan0@...wei.com>
> Cc: Wang ShaoBo <bobo.shaobowang@...wei.com>
> Cc: Yang Jihong <yangjihong1@...wei.com>
> Cc: Yonghong Song <yhs@...com>
> Cc: YueHaibing <yuehaibing@...wei.com>
> Link: https://lore.kernel.org/lkml/
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>

Reviewed-by: Ian Rogers <irogers@...gle.com>

Thanks,
Ian

> ---
>  tools/perf/builtin-trace.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index 0ebfa95895e0bf4d..3964cf44cdbcb3e8 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -3895,7 +3895,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
>         if (err < 0)
>                 goto out_error_open;
>  #ifdef HAVE_BPF_SKEL
> -       {
> +       if (trace->syscalls.events.bpf_output) {
>                 struct perf_cpu cpu;
>
>                 /*
> @@ -3916,7 +3916,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
>                 goto out_error_mem;
>
>  #ifdef HAVE_BPF_SKEL
> -       if (trace->skel->progs.sys_enter)
> +       if (trace->skel && trace->skel->progs.sys_enter)
>                 trace__init_syscalls_bpf_prog_array_maps(trace);
>  #endif
>
> @@ -4850,6 +4850,9 @@ int cmd_trace(int argc, const char **argv)
>         }
>
>  #ifdef HAVE_BPF_SKEL
> +       if (!trace.trace_syscalls)
> +               goto skip_augmentation;
> +
>         trace.skel = augmented_raw_syscalls_bpf__open();
>         if (!trace.skel) {
>                 pr_debug("Failed to open augmented syscalls BPF skeleton");
> @@ -4884,6 +4887,7 @@ int cmd_trace(int argc, const char **argv)
>         }
>         trace.syscalls.events.bpf_output = evlist__last(trace.evlist);
>         assert(!strcmp(evsel__name(trace.syscalls.events.bpf_output), "__augmented_syscalls__"));
> +skip_augmentation:
>  #endif
>         err = -1;
>
> --
> 2.41.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ