lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWZXPjD3Ok5XmMwwaYt+9mL+V8t8fNSUdf-F5PPiEAvrg@mail.gmail.com>
Date: Thu, 20 Feb 2025 09:12:46 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Kan Liang <kan.liang@...ux.intel.com>, 
	Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, 
	LKML <linux-kernel@...r.kernel.org>, linux-perf-users@...r.kernel.org, 
	Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH] perf report: Add 'tgid' sort key

On Wed, Feb 19, 2025 at 1:10 PM Namhyung Kim <namhyung@...nel.org> wrote:
>
> On Wed, Feb 19, 2025 at 03:47:44PM +0100, Arnaldo Carvalho de Melo wrote:
> > On Wed, Feb 19, 2025 at 03:37:10PM +0100, Arnaldo Carvalho de Melo wrote:
> > > On Tue, Feb 18, 2025 at 02:03:01PM -0800, Namhyung Kim wrote:
> > > > On Tue, Feb 18, 2025 at 10:01:33PM +0100, Arnaldo Carvalho de Melo wrote:
> > > > > On Tue, Feb 18, 2025 at 09:36:52PM +0100, Arnaldo Carvalho de Melo wrote:
> > > > > > So the call to maps_fixup_end() will set maps->end_broken to false,
> > > > > > since it fixed up the map ends, etc, but then we insert more maps with
> > > > > > broken ends:
> > > > >
> > > > > > #6  0x0000000000633d52 in check_invariants (maps=0xf967c0) at util/maps.c:95
> > > > > > 95                                            assert(map__end(prev) <= map__end(map));
> > > > > > (gdb) p prev->dso->name
> > > > > > $1 = 0xfc47ab "bpf_trampoline_6442522522"
> > > > >
> > > > > So the above map is created overlapping a previously existing map:
> > > > >
> > > > > root@...ber:~# perf probe -l
> > > > >   probe_perf:maps_fixup_end (on maps__fixup_end:1@...l/maps.c in /home/acme/bin/perf with maps)
> > > > >   probe_perf:maps_insert (on maps__insert:1@...l/maps.c in /home/acme/bin/perf with maps name start end)
> > > > > root@...ber:~#
> > > > >
> > > > > root@...ber:~# perf trace --lib -e probe_perf:maps* perf record sleep
> > > > > <SNIP>
> > > > >    319.791 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_egress" start=0xffffffffc0160788 end=0xffffffffc01607c8)
> > > > >    319.810 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_ingress" start=0xffffffffc01647b8 end=0xffffffffc01647f8)
> > > > >    319.822 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_egress" start=0xffffffffc016482c end=0xffffffffc016486c)
> > > > >    319.834 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_ingress" start=0xffffffffc01648ac end=0xffffffffc01648ec)
> > > > >    319.845 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_be31ae23198a0378_sd_devices" start=0xffffffffc0186388 end=0xffffffffc01864b2)
> > > > >    319.857 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_trampoline_6442522522" start=0xffffffffc0147640 end=0xffffffffc0148640)
> > > > > [ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ]
> > > > > perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed.
> > > > > root@...ber:~#
> > > > >
> > > > > So a PERF_RECORD_KSYMBOL processing will add a map for
> > > > > "bpf_trampoline_6442522522" that has its start after before the
> > > > > "bpf_prog_40ddf486530245f5_sd_devices" start, ok, but ends after
> > > > > "bpf_prog_40ddf486530245f5_sd_devices", overlapping it.
> > > > >
> > > > > machine__process_ksymbol_register() does:
> > > > >
> > > > > 713                     map__set_start(map, event->ksymbol.addr);
> > > > > 714                     map__set_end(map, map__start(map) + event->ksymbol.len);
> > > > > 715                     err = maps__insert(machine__kernel_maps(machine), map);
> > > > >
> > > > > And:
> > > > >
> > > > > (gdb) p /x event->ksymbol.addr
> > > > > $2 = 0xffffffffc0147a2c
> > > > > (gdb) p event->ksymbol.len
> > > > > $3 = 306
> > > >
> > > > Hmm.. so I think the situation is like below.
> > > >
> > > >                    (bpf_trampoline_6442522522)
> > > >             +---------------------------------------+
> > > >       |                                       |
> > > >       |       +------------------------+      |
> > > >       |       | (bpf_prog_40ddf486...) | <----+----  adding this
> > > >       |       |                        |      |
> > > >       |       |                        |      |
> > > >       |   c0147a2c                            |
> > > >       |                                       |
> > > >         c0147640                                 c0148640
> > > >
> > > > And it failed to add bpf_prog_40ddf486... in check_invariants() because
> > > > the end address is smaller than the previous map.
> > >
> > > No, it didn't fail to add, it managed to do it which left the kernel
> > > maps in a broken state, with overlappings while it had a cleared
> > > ends_broken, then, later, when the checks_invariant is finally called at
> > > perf record exit time:
> >
> > Nope, __maps__insert() should notice that the ends are broken and set
> > it:
> >
> >         if (nr_maps == 1) {
> >                 /* If there's just 1 entry then maps are sorted. */
> >                 maps__set_maps_by_address_sorted(maps, true);
> >                 maps__set_maps_by_name_sorted(maps, maps_by_name != NULL);
> >         } else {
> >                 /* Sorted if maps were already sorted and this map starts after the last one. */
> >                 maps__set_maps_by_address_sorted(maps,
> >                         maps__maps_by_address_sorted(maps) &&
> >                         map__end(maps_by_address[nr_maps - 2]) <= map__start(new));
> >                 maps__set_maps_by_name_sorted(maps, false);
> >         }
> >         if (map__end(new) < map__start(new))
> >                 RC_CHK_ACCESS(maps)->ends_broken = true;
> >
> >
> > humm, RC_CHK_ACCESS(maps)->ends_broken should be set for the case we
> > have and I think it isn't being... Then the bpf trampoline map that is
> > the last entry to be added is before the last entry and thus
> > maps_by_address_sorted is set to false, ends_broken continues false and
> > at the end maps_by_address_sorted is set to true and the last
> > check_invariants triggerrs the asserts...
>
> Right, probably it needs to set the ends_broken when the end address of
> the new map is smaller than the previous (but the start address is
> bigger) and fixup the end address when it sorts the maps by address.

Ugh, I get git blamed for ends_broken and I was wondering what the heck it is:
https://lore.kernel.org/all/20240210031746.4057262-2-irogers@google.com/
My memory is that when the rb-tree was built the maps put in it could
be broken and ends_broken was to capture we were in this state as the
sorting would get broken, invariants be off, etc.. The rb-tree
constructing code would then call maps__fixup_end. Having the caller
call maps__fixup_end seems error prone, as does the whole
"ends_broken" thing - remember I was in the code to fix memory leaks
so modifying the maps API wasn't front of mind. I added ends_broken,
the original rb-tree had no notion of it, because I was trying to get
the invariants right for the testing I could do and ends_broken was
the pragmatic thing to do for odd cases like kernel modules before
maps__fixup_end is called.

The maps API has evolved and we have a pretty robust, but possibly not
fast, maps__fixup_overlap_and_insert:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/maps.h?h=perf-tools-next#n69
I think ideally we'd make maps__insert uphold the invariants and not
have ends_broken. I'm worried that making ends_broken more load
bearing isn't the right thing to do, we may even be able to not have
the variable for the "ifndef NDEBUG" case, which making it load
bearing would completely defeat.

So I think the fix here should be to understand the maps construction
code for the modules, try to work out why maps__fixup_end wasn't
called, perhaps migrate the code to maps__fixup_overlap_and_insert or
add a missed maps__fixup_end call.

Given the blame I kind of feel responsible for this, but the real
issue is adding the invariant checks has caught a latent bug that the
rb-tree code would have just ignored and possibly been broken as a
consequence. I lack bandwidth and a reproduction so thank you for
digging into this.

Thanks,
Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ