lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fU_UUKmu0gTmODXCG5JUK6MRFOTFU+LCJTfA5yTP+aomA@mail.gmail.com>
Date: Mon, 19 May 2025 10:00:25 -0700
From: Ian Rogers <irogers@...gle.com>
To: Howard Chu <howardchu95@...il.com>
Cc: acme@...nel.org, mingo@...hat.com, namhyung@...nel.org, 
	mark.rutland@....com, alexander.shishkin@...ux.intel.com, jolsa@...nel.org, 
	adrian.hunter@...el.com, peterz@...radead.org, kan.liang@...ux.intel.com, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/5] perf test trace: Reduce test failures and make
 error messages verbose

On Mon, May 19, 2025 at 8:01 AM Howard Chu <howardchu95@...il.com> wrote:
>
> Hello Ian,
>
> I jumped to a conclusion, there are reasons other than running tests
> in parallel.
>
> On Sun, May 18, 2025 at 3:18 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > On Sun, May 18, 2025 at 12:10 PM Howard Chu <howardchu95@...il.com> wrote:
> > >
> > > Currently, BTF tests fail constantly, this series fixes two major reasons
> > > why they fail, and makes the error messages acquired when using '-vv'
> > > more verbose, so when they fail, one can easily diagnose the problem.
> > >
> > > Before:
> > >     $ sudo /tmp/perf test enum -vv
> > >     107: perf trace enum augmentation tests:
> > >     107: perf trace enum augmentation tests                              : Running
> > >     --- start ---
> > >     test child forked, pid 783533
> > >     Checking if vmlinux exists
> > >     Tracing syscall landlock_add_rule
> > >     Tracing non-syscall tracepoint syscall
> > >     ---- end(-1) ----
> > >     107: perf trace enum augmentation tests                              : FAILED!
> > >
> > > After:
> > >     $ sudo /tmp/perf test enum -vv
> > >     107: perf trace enum augmentation tests:
> > >     107: perf trace enum augmentation tests                              : Running
> > >     --- start ---
> > >     test child forked, pid 851658
> > >     Checking if vmlinux exists
> > >     Tracing syscall landlock_add_rule
> > >     Tracing non-syscall tracepoint timer:hrtimer_setup,timer:hrtimer_start
> > >     [tracepoint failure] Failed to trace tracepoint timer:hrtimer_setup,timer:hrtimer_start, output:
> > >     event syntax error: 'timer:hrtimer_setup,timer:hrtimer_start'
> > >                          \___ unknown tracepoint
> > >
> > >     Error:  File /sys/kernel/tracing//events/timer/hrtimer_setup not found.
> > >     Hint:   Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
> > >
> > >     Run 'perf list' for a list of valid events
> > >
> > >      Usage: perf trace [<options>] [<command>]
> > >         or: perf trace [<options>] -- <command> [<options>]
> > >         or: perf trace record [<options>] [<command>]
> > >         or: perf trace record [<options>] -- <command> [<options>]
> > >
> > >         -e, --event <event>   event/syscall selector. use 'perf list' to list available events---- end(-1) ----
> > >     107: perf trace enum augmentation tests                              : FAILED!
> > >
> > > Changes in v2:
> > > - Add an extra newline after error messages
> > > - Rename the title of patch 3 to 'Stop tracing hrtimer_setup...'
> > > - Take the debug diff for explanation out of patch 5 to make it apply
> > >   normally
> > > - Add base-commit in this cover letter
> >
> > Thanks Howard! I did some testing but see failures that may be
> > pre-existing issues:
> >
> > ```
> > --- start ---
> > test child forked, pid 264236
> > Checking if vmlinux exists
> > Tracing syscall landlock_add_rule
> > [syscall failure] Failed to trace syscall landlock_add_rule, output:
> >
>
> I think this doesn't have any output, a sign of failure of collection
> on the BPF side.
>
> > ---- end(-1) ----
> > 107: perf trace enum augmentation tests                              : FAILED!
> > --- start ---
> > test child forked, pid 264248
> > Checking if vmlinux BTF exists
> > Testing perf trace's string augmentation
> > String augmentation test failed, output:
> >
>
> Here too.
>
> > ---- end(-1) ----
> > --- start ---
> > test child forked, pid 278774
> > Checking if vmlinux exists
> > Tracing syscall landlock_add_rule
> > [syscall failure] Failed to trace syscall landlock_add_rule, output:
> >      0.000 (         ): perf/278843 landlock_add_rule(ruleset_fd: 11,
> > rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffdcf1dad90,
> > flags: 45) ...
>
> This is strange to me. No return value, no elapsed time. Judging from
> experience this comes from the lack of the second sys_exit event that
> sets both of them.
>
> > ---- end(-1) ----
> > 107: perf trace enum augmentation tests                              : FAILED!
> > --- start ---
> > test child forked, pid 279196
> > Checking if vmlinux exists
> > Tracing syscall landlock_add_rule
> > [syscall failure] Failed to trace syscall landlock_add_rule, output:
> >      0.000 (         ): perf/279262 landlock_add_rule(ruleset_fd: 11,
> > rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7fff37a70fd0,
> > flags: 45) ...
>
> Ditto.
>
> > ---- end(-1) ----
> > 107: perf trace enum augmentation tests                              : FAILED!
> > 108: perf trace BTF general tests                                    : FAILED!
> > --- start ---
> > test child forked, pid 278187
> > Checking if vmlinux BTF exists
> > Testing perf trace's string augmentation
> > Testing perf trace's buffer augmentation
> > Testing perf trace's struct augmentation
> > BTF struct augmentation test failed, output:
> > sleep/278352 clock_nanosleep(0, 0, {1,1,}, 0x7ffd31550f50) = 0
>
> This is the difference in libbpf's btf_dump's behavior. I grepped
> "grep -q -E "^sleep/[0-9]+ clock_nanosleep\(0, 0, \{1,\},
> 0x[0-9a-f]+\) += +[0-9]+$"", the {1, 1,} didn't match. On my machine
> it is '{1,}'
> ~~~
> sudo /tmp/perf trace --sort-events -e clock_nanosleep --force-btf
> --max-events=1 -- sleep 1
> sleep/338371 clock_nanosleep(0, 0, {1,}, 0x7ffeb4e9dd10) = 0
> ~~~
>
> Which is strange, I thought we used the same libbpf from the kernel
> tree and statically link it. Why is it different on our machines?
> FWIW, I do have libbpf installed on my system, from their github
> repo's main branch.
> ~~~
> $ cat /usr/include/bpf/libbpf_version.h
> /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> /* Copyright (C) 2021 Facebook */
> #ifndef __LIBBPF_VERSION_H
> #define __LIBBPF_VERSION_H
>
> #define LIBBPF_MAJOR_VERSION 1
> #define LIBBPF_MINOR_VERSION 6
>
> #endif /* __LIBBPF_VERSION_H */
> ~~~
>
> But I thought we used the same 1.6.x version from the tree...
> ~~~
> $ cat tools/lib/bpf/libbpf_version.h
> /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> /* Copyright (C) 2021 Facebook */
> #ifndef __LIBBPF_VERSION_H
> #define __LIBBPF_VERSION_H
>
> #define LIBBPF_MAJOR_VERSION 1
> #define LIBBPF_MINOR_VERSION 6
>
> #endif /* __LIBBPF_VERSION_H */
> ~~~
>
> > ---- end(-1) ----
> > 108: perf trace BTF general tests                                    : FAILED!
> > --- start ---
> > test child forked, pid 278775
> > Checking if vmlinux BTF exists
> > Testing perf trace's string augmentation
> > Testing perf trace's buffer augmentation
> > Buffer augmentation test failed, output:
> > buffer content
>
> This should have two lines of output, the second line is missing,
> something to do with the BPF collection I mentioned above.
> ~~~
> $ sudo /tmp/perf/perf trace --sort-events -e write --force-btf
> --max-events=1 -- echo "buffer content"
> buffer content
> echo/393319 write(1, buffer content\10, 15) = 15
> ~~~
>
> > ---- end(-1) ----
> > 108: perf trace BTF general tests                                    : FAILED!
> > --- start ---
> > test child forked, pid 279547
> > Checking if vmlinux BTF exists
> > Testing perf trace's string augmentation
> > Testing perf trace's buffer augmentation
> > Testing perf trace's struct augmentation
> > BTF struct augmentation test failed, output:
> > sleep/279619 clock_nanosleep(0, 0, {1,1,}, 0x7ffcd47b6450) = 0
>
> Same '{1,1,}' appears.
>
> > ---- end(-1) ----
> > 108: perf trace BTF general tests                                    : FAILED!
> > --- start ---
> > test child forked, pid 264252
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.333 MB /tmp/temporary_file.pl459 ]
> > Failed: cannot find *nanosleep syscall
>
> This test was written by Namhyung, again it failed to collect some samples.
>
> > ---- end(-1) ----
> > 110: perf trace record and replay                                    : FAILED!
> > --- start ---
> > test child forked, pid 278193
> > testing: perf trace -s -- true
> > Error: cannot find enough pattern ^\s*(open|read|close).*[0-9]+%$ in the output
>
> May also be a loss of samples.
>
> There are some problems. From the output you provided, I can see the
> obvious sample loss in perf trace, and the differing behavior in
> libbpf's btf_dump. Fixing them.

Massively appreciated! I was running the tests with `perf test -v
"perf trace"` so there was parallelism. Running the tests with your
changes and with "-S" for sequential I still reliably see:
```
--- start ---
test child forked, pid 279547
Checking if vmlinux BTF exists
Testing perf trace's string augmentation
Testing perf trace's buffer augmentation
Testing perf trace's struct augmentation
BTF struct augmentation test failed, output:
sleep/279619 clock_nanosleep(0, 0, {1,1,}, 0x7ffcd47b6450) = 0
---- end(-1) ----
108: perf trace BTF general tests                                    : FAILED!
```
but the other problems appear to go away.

Thanks,
Ian

> Thanks,
> Howard

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ