lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z61Qd7NG4k8DDvc6@google.com>
Date: Wed, 12 Feb 2025 17:52:55 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	linux-perf-users@...r.kernel.org,
	Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH] perf report: Add 'tgid' sort key

On Wed, Feb 12, 2025 at 02:10:42PM -0800, Ian Rogers wrote:
> On Wed, Feb 12, 2025 at 1:59 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 1:07 PM Arnaldo Carvalho de Melo
> > <acme@...nel.org> wrote:
> > >
> > > On Wed, Feb 12, 2025 at 10:05:27PM +0100, Arnaldo Carvalho de Melo wrote:
> > > > On Wed, Feb 05, 2025 at 04:01:37PM -0800, Namhyung Kim wrote:
> > > > > Sometimes we need to analyze the data in process level but current sort
> > > > > keys only work on thread level.  Let's add 'tgid' sort key for that as
> > > > > 'pid' is already taken for thread.
> > > > >
> > > > > This will look mostly the same, but it only uses tgid instead of tid.
> > > > > Here's an example of a process with two threads (thloop).
> > > > >
> > > > >   $ perf record -- perf test -w thloop
> > > >
> > > > Unrelated, but when building perf with DEBUG=1 and trying to test the
> > > > above I noticed:
> > > >
> > > > root@...ber:~# perf record -- perf test -w thloop
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 0.404 MB perf.data (7968 samples) ]
> > > > perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed.
> > > > Aborted (core dumped)
> > > > root@...ber:~# perf record -- perf test -w offcpu
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 0.040 MB perf.data (23 samples) ]
> > > > perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed.
> > > > Aborted (core dumped)
> > > > root@...ber:~#
> > > >
> > > > I have:
> > > >
> > > > ⬢ [acme@...lbox perf-tools-next]$ git log --oneline perf-tools-next/perf-tools-next..
> > > > 9de1ed6fa3b73cb1 (HEAD -> perf-tools-next) perf report: Add 'tgid' sort key
> > > > 23e98ede2a353530 perf trace: Add --summary-mode option
> > > > e6d6104625a3790b perf tools: Get rid of now-unused rb_resort.h
> > > > 173ec14e72ef4ed7 perf trace: Convert syscall_stats to hashmap
> > > > 66edfb5d404e743d perf trace: Allocate syscall stats only if summary is on
> > > > ca6637e1ea08e6f4 perf parse-events filter: Use evsel__find_pmu()
> > > > bd1ac4a678f7f2c8 perf bench evlist-open-close: Reduce scope of 2 variables
> > > > cd59081880e89df8 perf test: Add direct off-cpu test
> > > > 56cbd794c0c46ba9 perf record --off-cpu: Add --off-cpu-thresh option
> > > > 28d9b19c5455556f perf record --off-cpu: Dump the remaining samples in BPF's stack trace map
> > > > 2bc05b02743b50a7 perf script: Display off-cpu samples correctly
> > > > bfa457a621596947 perf record --off-cpu: Disable perf_event's callchain collection
> > > > eca732cc42d20266 perf evsel: Assemble offcpu samples
> > > > 74ce50e40c569e90 perf record --off-cpu: Dump off-cpu samples in BPF
> > > > e75f8ce63bfa6cb9 perf record --off-cpu: Preparation of off-cpu BPF program
> > > > 0ffab9d26971c91c perf record --off-cpu: Parse off-cpu event
> > > > efc3fe2070853b7d perf evsel: Expose evsel__is_offcpu_event() for future use
> > > > ⬢ [acme@...lbox perf-tools-next]$
> > > >
> > > > locally, that is the stuff I've been testing lately, doubt it is related
> > > > to these patches, I'll investigate later, have to go AFK, so FWIW as a
> > > > heads up.
> > >
> > > Had time to extract this, now going really AFK:
> > >
> > > [New Thread 0x7fffdf24c6c0 (LWP 580622)]
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.403 MB perf.data (7948 samples) ]
> > > [Thread 0x7fffdf24c6c0 (LWP 580622) exited]
> > > perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed.
> > >
> > > Thread 1 "perf" received signal SIGABRT, Aborted.
> > > Downloading 4.06 K source file /usr/src/debug/glibc-2.39-37.fc40.x86_64/nptl/pthread_kill.c
> > > __pthread_kill_implementation (threadid=<optimized out>, signo=signo@...ry=6, no_tid=no_tid@...ry=0) at pthread_kill.c:44
> > > 44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
> > > (gdb) bt
> > > #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@...ry=6, no_tid=no_tid@...ry=0) at pthread_kill.c:44
> > > #1  0x00007ffff6ea80a3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
> > > #2  0x00007ffff6e4ef1e in __GI_raise (sig=sig@...ry=6) at ../sysdeps/posix/raise.c:26
> > > #3  0x00007ffff6e36902 in __GI_abort () at abort.c:79
> > > #4  0x00007ffff6e3681e in __assert_fail_base (fmt=0x7ffff6fc3bb8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@...ry=0x7bef08 "map__end(prev) <= map__end(map)",
> > >     file=file@...ry=0x7bedf8 "util/maps.c", line=line@...ry=95, function=function@...ry=0x7bf1c0 <__PRETTY_FUNCTION__.6> "check_invariants") at assert.c:96
> > > #5  0x00007ffff6e47047 in __assert_fail (assertion=0x7bef08 "map__end(prev) <= map__end(map)", file=0x7bedf8 "util/maps.c", line=95,
> > >     function=0x7bf1c0 <__PRETTY_FUNCTION__.6> "check_invariants") at assert.c:105
> > > #6  0x00000000006347a1 in check_invariants (maps=0xf987e0) at util/maps.c:95
> > > #7  0x0000000000635ae2 in maps__remove (maps=0xf987e0, map=0xf98a80) at util/maps.c:538
> > > #8  0x000000000062afd2 in machine__destroy_kernel_maps (machine=0xf98178) at util/machine.c:1176
> > > #9  0x000000000062b32b in machines__destroy_kernel_maps (machines=0xf98178) at util/machine.c:1238
> > > #10 0x00000000006388af in perf_session__destroy_kernel_maps (session=0xf97f60) at util/session.c:105
> > > #11 0x0000000000638df0 in perf_session__delete (session=0xf97f60) at util/session.c:248
> > > #12 0x0000000000431f18 in __cmd_record (rec=0xecace0 <record>, argc=4, argv=0x7fffffffde60) at builtin-record.c:2888
> > > #13 0x00000000004351fb in cmd_record (argc=4, argv=0x7fffffffde60) at builtin-record.c:4286
> > > #14 0x00000000004bd4d4 in run_builtin (p=0xecddc0 <commands+288>, argc=6, argv=0x7fffffffde60) at perf.c:351
> > > #15 0x00000000004bd77b in handle_internal_command (argc=6, argv=0x7fffffffde60) at perf.c:404
> > > #16 0x00000000004bd8d4 in run_argv (argcp=0x7fffffffdc4c, argv=0x7fffffffdc40) at perf.c:448
> > > #17 0x00000000004bdc1d in main (argc=6, argv=0x7fffffffde60) at perf.c:556
> > > (gdb)
> >
> > So my guess would be that something modified a map and broke the
> > invariants of the maps_by_addresss/maps_by_name. It should be possible
> > to add more check_invariants to work out where this happens.
> >
> > Thanks,
> > Ian
> 
> I also suspect this is a regression. If you could bisect to find the
> cause then the fix is probably to not modify a map but clone it,
> change it and then reinsert it into the maps - the insert is called
> maps__fixup_overlap_and_insert so that maps don't overlap one another
> like the invariant check is detecting. Fwiw, in the older rbtree code,
> invariant breakages like this would be silently ignored, so we may
> have a latent bug :-(

Sorry, I also cannot reproduce it on my machine.  But I think it's
unrelated to this change since you saw it during record.  It'd be nice
if you could bisect.

And I think I forgot to call thread__put() for the leader thread. :)
Will update in v2.

Thanks,
Namhyung


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ