lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzuH-8ZCa7BVt3ef@google.com>
Date: Mon, 18 Nov 2024 10:31:23 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Kan Liang <kan.liang@...ux.intel.com>,
	Athira Jajeev <atrajeev@...ux.vnet.ibm.com>,
	James Clark <james.clark@...aro.org>,
	Dominique Martinet <asmadeus@...ewreck.org>,
	Yang Li <yang.lee@...ux.alibaba.com>,
	Colin Ian King <colin.i.king@...il.com>,
	Yang Jihong <yangjihong@...edance.com>,
	"Steinar H. Gunderson" <sesse@...gle.com>,
	Oliver Upton <oliver.upton@...ux.dev>,
	Ilkka Koskinen <ilkka@...amperecomputing.com>,
	Ze Gao <zegao2021@...il.com>, Weilin Wang <weilin.wang@...el.com>,
	Ben Gainey <ben.gainey@....com>,
	zhaimingbing <zhaimingbing@...s.chinamobile.com>,
	Zixian Cai <fzczx123@...il.com>, Andi Kleen <ak@...ux.intel.com>,
	Paran Lee <p4ranlee@...il.com>,
	Thomas Falcon <thomas.falcon@...el.com>,
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
	"Steven Rostedt (Google)" <rostedt@...dmis.org>
Subject: Re: [PATCH v4 0/6] Avoid parsing tracepoint format just for id

On Wed, Nov 13, 2024 at 10:06:13AM -0800, Ian Rogers wrote:
> On Sat, Nov 9, 2024 at 9:04 AM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > On Sat, Nov 09, 2024 at 08:26:20AM -0800, Ian Rogers wrote:
> > > On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@...nel.org> wrote:
> > > > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote:
> > > > > Ian Rogers (6):
> > > > >   tool api fs: Correctly encode errno for read/write open failures
> > > > >   perf trace-event: Constify print arguments
> > > > >   perf trace-event: Always build trace-event-info.c
> > > > >   perf evsel: Add/use accessor for tp_format
> > > > >   perf evsel: Allow evsel__newtp without libtraceevent
> > > > >   perf tests: Enable tests disabled due to tracepoint parsing
> > > >
> > > > After applying this series, I'm seeing some test failures.  But I don't
> > > > understand why it affects non-tracepoint events though.
> > > >
> > > >   $ sudo ./perf test -v pipe
> > > >   --- start ---
> > > >   test child forked, pid 3036123
> > > >    1bde35-1bdecc l noploop
> > > >   perf does have symbol 'noploop'
> > > >
> > > >   Record+report pipe test
> > > >   [ perf record: Woken up 1 times to write data ]
> > > >   [ perf record: Captured and wrote 0.210 MB - ]
> > > >   [ perf record: Woken up 2 times to write data ]
> > > >   [ perf record: Captured and wrote 0.517 MB - ]
> > > >   [ perf record: Woken up 2 times to write data ]
> > > >   [ perf record: Captured and wrote 0.516 MB - ]
> > > >   Record+report pipe test [Success]
> > > >
> > > >   Inject -B build-ids test
> > > >   0xa5c [0x17a4]: failed to process type: 80
> > > >   Error:
> > > >   failed to process sample
> > > >   Inject build-ids test [Failed - cannot find noploop function in pipe #1]
> > > >
> > > >   Inject -b build-ids test
> > > >   0xa5c [0x17a4]: failed to process type: 80
> > > >   Error:
> > > >   failed to process sample
> > > >   Inject build-ids test [Failed - cannot find noploop function in pipe #1]
> > > >
> > > >   Inject --buildid-all build-ids test
> > > >   0xa5c [0x17a4]: failed to process type: 80
> > > >   Error:
> > > >   failed to process sample
> > > >   Inject build-ids test [Failed - cannot find noploop function in pipe #1]
> > > >
> > > >   Inject --mmap2-buildid-all build-ids test
> > > >   0xa5c [0x17a4]: failed to process type: 80
> > > >   Error:
> > > >   failed to process sample
> > > >   Inject build-ids test [Failed - cannot find noploop function in pipe #1]
> > > >   ---- end(-1) ----
> > > >    84: perf pipe recording and injection test                          : FAILED!
> > > >
> > > >   $ sudo ./perf test -v Zstd
> > > >   --- start ---
> > > >   test child forked, pid 3036097
> > > >   Collecting compressed record file:
> > > >   500+0 records in
> > > >   500+0 records out
> > > >   256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s
> > > >   [ perf record: Woken up 1 times to write data ]
> > > >   [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ]
> > > >   Checking compressed events stats:
> > > >   Couldn't decompress data
> > > >   0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted]
> > > >   Error:
> > > >   failed to process sample
> > > >   ---- end(-1) ----
> > > >    86: Zstd perf.data compression/decompression                        : FAILED!
> > > >
> > > > Thanks,
> > > > Namhyung
> > >
> > > I'm not able to reproduce:
> > > ```
> > > $ git log --oneline |head
> > > a59bca6eb0a6 perf test: Add a runs-per-test flag
> > > 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing
> > > 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent
> > > 7f57057c7884 perf evsel: Add/use accessor for tp_format
> > > c27d357d2d4c perf trace-event: Always build trace-event-info.c
> > > 20bf7a2cd68a perf trace-event: Constify print arguments
> > > f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures
> > > ...
> > > $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 84: perf pipe recording and injection test                          : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > 86: Zstd perf.data compression/decompression                        : Ok
> > > ```
> > > Similarly not as root or if runs-per-test is 100.
> > >
> > > Agreed that the changes are for tracepoints and these tests aren't for
> > > tracepoints, so an interaction wouldn't be expected. If you have a
> > > reliable reproduction perhaps you can bisect it.
> >
> > it says:
> >
> > 9c10de391840a35ab095b65e9a5c4fad0ac52068 is the first bad commit
> > commit 9c10de391840a35ab095b65e9a5c4fad0ac52068 (HEAD)
> > Author: Ian Rogers <irogers@...gle.com>
> > Date:   Fri Nov 8 10:47:46 2024 -0800
> >
> >     tool api fs: Correctly encode errno for read/write open failures
> >
> >     Switch from returning -1 to -errno so that callers can determine types
> >     of failure.
> >
> >     Signed-off-by: Ian Rogers <irogers@...gle.com>
> >     Acked-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> >
> >  tools/lib/api/fs/fs.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> So I tried to eye-ball/grep all callers to spot assumptions on the
> return value like:
> ```
> err = ...__read_int
> if (err == -1)
> ```
> Didn't spot anything.
> 
> It seems in the test log the record is failing so I ran this under
> gdb, set breakpoints on the 3 modified functions and then looked up
> the call stack to spot bad return value assumptions. Everything looks
> good.
> I then tried inject and report, the only file read by these functions
> is /proc/sys/kernel/perf_event_paranoid as part of symbol
> initialization (nit, this should probably be read lazily and the
> restriction should really come from the perf.data file, not the
> running system) and those calls look good.
> 
> The change is small and not critical for the series. It improves the
> error message when reading the tracepoint id fails. So we could move
> forward with the rest of the series, but that could be annoying for
> tracepoint users.
> 
> If I had a reproducer I'd revert the 1 line change on each function to
> find out which is causing the regression. Once you have that then you
> can binary search to find the bad call by using some global counter
> where the first 'n' calls use the new return value and the later use
> the old value. You can then vary 'n' to binary search and find the bad
> caller.
> 
> Is there any chance you can help diagnose this or help me to find the
> reproducer?

I think this depends on the system configuration.  I've debugged it
failed when it gets cpu topology:

  ...
  read int failed: /sys/devices/system/cpu/cpu112/topology/core_id (errno=2)
  read int failed: /sys/devices/system/cpu/cpu112/topology/physical_package_id (errno=2)
  read int failed: /sys/devices/system/cpu/cpu112/topology/die_id (errno=2)
  ...

Maybe it's because # online CPUs != # possible CPUs.

  $ cat /sys/devices/system/cpu/online 
  0-63
  $ cat /sys/devices/system/cpu/possible 
  0-127

There's a code like cpu__get_socket_id() to use the return value of
sysfs__read_int() directly.  And it saves the value to aggr_cpu_id which
requires exact match like in aggr_cpu_id__equal().

Thanks,
Namhyung


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ