lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH0uvog3kt=QR1T6uXMBmecDWsh-8UWKXukJ-rRk3in7fud4Eg@mail.gmail.com>
Date: Mon, 19 May 2025 08:01:10 -0700
From: Howard Chu <howardchu95@...il.com>
To: Ian Rogers <irogers@...gle.com>
Cc: acme@...nel.org, mingo@...hat.com, namhyung@...nel.org, 
	mark.rutland@....com, alexander.shishkin@...ux.intel.com, jolsa@...nel.org, 
	adrian.hunter@...el.com, peterz@...radead.org, kan.liang@...ux.intel.com, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/5] perf test trace: Reduce test failures and make
 error messages verbose

Hello Ian,

I jumped to a conclusion, there are reasons other than running tests
in parallel.

On Sun, May 18, 2025 at 3:18 PM Ian Rogers <irogers@...gle.com> wrote:
>
> On Sun, May 18, 2025 at 12:10 PM Howard Chu <howardchu95@...il.com> wrote:
> >
> > Currently, BTF tests fail constantly, this series fixes two major reasons
> > why they fail, and makes the error messages acquired when using '-vv'
> > more verbose, so when they fail, one can easily diagnose the problem.
> >
> > Before:
> >     $ sudo /tmp/perf test enum -vv
> >     107: perf trace enum augmentation tests:
> >     107: perf trace enum augmentation tests                              : Running
> >     --- start ---
> >     test child forked, pid 783533
> >     Checking if vmlinux exists
> >     Tracing syscall landlock_add_rule
> >     Tracing non-syscall tracepoint syscall
> >     ---- end(-1) ----
> >     107: perf trace enum augmentation tests                              : FAILED!
> >
> > After:
> >     $ sudo /tmp/perf test enum -vv
> >     107: perf trace enum augmentation tests:
> >     107: perf trace enum augmentation tests                              : Running
> >     --- start ---
> >     test child forked, pid 851658
> >     Checking if vmlinux exists
> >     Tracing syscall landlock_add_rule
> >     Tracing non-syscall tracepoint timer:hrtimer_setup,timer:hrtimer_start
> >     [tracepoint failure] Failed to trace tracepoint timer:hrtimer_setup,timer:hrtimer_start, output:
> >     event syntax error: 'timer:hrtimer_setup,timer:hrtimer_start'
> >                          \___ unknown tracepoint
> >
> >     Error:  File /sys/kernel/tracing//events/timer/hrtimer_setup not found.
> >     Hint:   Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
> >
> >     Run 'perf list' for a list of valid events
> >
> >      Usage: perf trace [<options>] [<command>]
> >         or: perf trace [<options>] -- <command> [<options>]
> >         or: perf trace record [<options>] [<command>]
> >         or: perf trace record [<options>] -- <command> [<options>]
> >
> >         -e, --event <event>   event/syscall selector. use 'perf list' to list available events---- end(-1) ----
> >     107: perf trace enum augmentation tests                              : FAILED!
> >
> > Changes in v2:
> > - Add an extra newline after error messages
> > - Rename the title of patch 3 to 'Stop tracing hrtimer_setup...'
> > - Take the debug diff for explanation out of patch 5 to make it apply
> >   normally
> > - Add base-commit in this cover letter
>
> Thanks Howard! I did some testing but see failures that may be
> pre-existing issues:
>
> ```
> --- start ---
> test child forked, pid 264236
> Checking if vmlinux exists
> Tracing syscall landlock_add_rule
> [syscall failure] Failed to trace syscall landlock_add_rule, output:
>

I think this doesn't have any output, a sign of failure of collection
on the BPF side.

> ---- end(-1) ----
> 107: perf trace enum augmentation tests                              : FAILED!
> --- start ---
> test child forked, pid 264248
> Checking if vmlinux BTF exists
> Testing perf trace's string augmentation
> String augmentation test failed, output:
>

Here too.

> ---- end(-1) ----
> --- start ---
> test child forked, pid 278774
> Checking if vmlinux exists
> Tracing syscall landlock_add_rule
> [syscall failure] Failed to trace syscall landlock_add_rule, output:
>      0.000 (         ): perf/278843 landlock_add_rule(ruleset_fd: 11,
> rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffdcf1dad90,
> flags: 45) ...

This is strange to me. No return value, no elapsed time. Judging from
experience this comes from the lack of the second sys_exit event that
sets both of them.

> ---- end(-1) ----
> 107: perf trace enum augmentation tests                              : FAILED!
> --- start ---
> test child forked, pid 279196
> Checking if vmlinux exists
> Tracing syscall landlock_add_rule
> [syscall failure] Failed to trace syscall landlock_add_rule, output:
>      0.000 (         ): perf/279262 landlock_add_rule(ruleset_fd: 11,
> rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7fff37a70fd0,
> flags: 45) ...

Ditto.

> ---- end(-1) ----
> 107: perf trace enum augmentation tests                              : FAILED!
> 108: perf trace BTF general tests                                    : FAILED!
> --- start ---
> test child forked, pid 278187
> Checking if vmlinux BTF exists
> Testing perf trace's string augmentation
> Testing perf trace's buffer augmentation
> Testing perf trace's struct augmentation
> BTF struct augmentation test failed, output:
> sleep/278352 clock_nanosleep(0, 0, {1,1,}, 0x7ffd31550f50) = 0

This is the difference in libbpf's btf_dump's behavior. I grepped
"grep -q -E "^sleep/[0-9]+ clock_nanosleep\(0, 0, \{1,\},
0x[0-9a-f]+\) += +[0-9]+$"", the {1, 1,} didn't match. On my machine
it is '{1,}'
~~~
sudo /tmp/perf trace --sort-events -e clock_nanosleep --force-btf
--max-events=1 -- sleep 1
sleep/338371 clock_nanosleep(0, 0, {1,}, 0x7ffeb4e9dd10) = 0
~~~

Which is strange, I thought we used the same libbpf from the kernel
tree and statically link it. Why is it different on our machines?
FWIW, I do have libbpf installed on my system, from their github
repo's main branch.
~~~
$ cat /usr/include/bpf/libbpf_version.h
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* Copyright (C) 2021 Facebook */
#ifndef __LIBBPF_VERSION_H
#define __LIBBPF_VERSION_H

#define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 6

#endif /* __LIBBPF_VERSION_H */
~~~

But I thought we used the same 1.6.x version from the tree...
~~~
$ cat tools/lib/bpf/libbpf_version.h
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
/* Copyright (C) 2021 Facebook */
#ifndef __LIBBPF_VERSION_H
#define __LIBBPF_VERSION_H

#define LIBBPF_MAJOR_VERSION 1
#define LIBBPF_MINOR_VERSION 6

#endif /* __LIBBPF_VERSION_H */
~~~

> ---- end(-1) ----
> 108: perf trace BTF general tests                                    : FAILED!
> --- start ---
> test child forked, pid 278775
> Checking if vmlinux BTF exists
> Testing perf trace's string augmentation
> Testing perf trace's buffer augmentation
> Buffer augmentation test failed, output:
> buffer content

This should have two lines of output, the second line is missing,
something to do with the BPF collection I mentioned above.
~~~
$ sudo /tmp/perf/perf trace --sort-events -e write --force-btf
--max-events=1 -- echo "buffer content"
buffer content
echo/393319 write(1, buffer content\10, 15) = 15
~~~

> ---- end(-1) ----
> 108: perf trace BTF general tests                                    : FAILED!
> --- start ---
> test child forked, pid 279547
> Checking if vmlinux BTF exists
> Testing perf trace's string augmentation
> Testing perf trace's buffer augmentation
> Testing perf trace's struct augmentation
> BTF struct augmentation test failed, output:
> sleep/279619 clock_nanosleep(0, 0, {1,1,}, 0x7ffcd47b6450) = 0

Same '{1,1,}' appears.

> ---- end(-1) ----
> 108: perf trace BTF general tests                                    : FAILED!
> --- start ---
> test child forked, pid 264252
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.333 MB /tmp/temporary_file.pl459 ]
> Failed: cannot find *nanosleep syscall

This test was written by Namhyung, again it failed to collect some samples.

> ---- end(-1) ----
> 110: perf trace record and replay                                    : FAILED!
> --- start ---
> test child forked, pid 278193
> testing: perf trace -s -- true
> Error: cannot find enough pattern ^\s*(open|read|close).*[0-9]+%$ in the output

May also be a loss of samples.

There are some problems. From the output you provided, I can see the
obvious sample loss in perf trace, and the differing behavior in
libbpf's btf_dump. Fixing them.

Thanks,
Howard

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ