lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAPqJEFrpb6T1p52BujbansA4KOFxKmeOT+e0Eg4rSmN3UQVxgg@mail.gmail.com>
Date: Wed, 7 Aug 2024 10:53:11 +0800
From: Eric Lin <eric.lin@...ive.com>
To: Ian Rogers <irogers@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Athira Rajeev <atrajeev@...ux.vnet.ibm.com>, 
	Namhyung Kim <namhyung@...nel.org>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	James Clark <james.clark@....com>, linux-perf-users <linux-perf-users@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, vincent.chen@...ive.com, 
	greentime.hu@...ive.com, Samuel Holland <samuel.holland@...ive.com>
Subject: Re: [PATCH] perf pmus: Fix duplicate events caused segfault

On Tue, Aug 6, 2024 at 5:56 PM Eric Lin <eric.lin@...ive.com> wrote:
>
>
> Hi Ian,
>
> On Tue, Aug 6, 2024 at 12:00 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > On Mon, Aug 5, 2024 at 8:29 PM Eric Lin <eric.lin@...ive.com> wrote:
> > >
> > >
> > > Hi Arnaldo,
> > >
> > > On Mon, Aug 5, 2024 at 11:43 PM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
> > > >
> > > > On Mon, Aug 05, 2024 at 07:54:33PM +0530, Athira Rajeev wrote:
> > > > >
> > > > >
> > > > > > On 4 Aug 2024, at 8:36 PM, Eric Lin <eric.lin@...ive.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Sun, Jul 21, 2024 at 11:44 PM Eric Lin <eric.lin@...ive.com> wrote:
> > > > > >>
> > > > > >> Hi Athira,
> > > > > >>
> > > > > >> On Sat, Jul 20, 2024 at 4:35 PM Athira Rajeev
> > > > > >> <atrajeev@...ux.vnet.ibm.com> wrote:
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>> On 19 Jul 2024, at 1:46 PM, Eric Lin <eric.lin@...ive.com> wrote:
> > > > > >>>>
> > > > > >>>> Currently, if vendor JSON files have two duplicate event names,
> > > > > >>>> the "perf list" command will trigger a segfault.
> > > > > >>>>
> > > > > >>>> In commit e6ff1eed3584 ("perf pmu: Lazily add JSON events"),
> > > > > >>>> pmu_events_table__num_events() gets the number of JSON events
> > > > > >>>> from table_pmu->num_entries, which includes duplicate events
> > > > > >>>> if there are duplicate event names in the JSON files.
> > > > > >>>
> > > > > >>> Hi Eric,
> > > > > >>>
> > > > > >>> Let us consider there are duplicate event names in the JSON files, say :
> > > > > >>>
> > > > > >>> metric.json with: EventName as pmu_cache_miss, EventCode as 0x1
> > > > > >>> cache.json with:  EventName as pmu_cache_miss, EventCode as 0x2
> > > > > >>>
> > > > > >>> If we fix the segfault and proceed, still “perf list” will list only one entry for pmu_cache_miss with may be 0x1/0x2 as event code ?
> > > > > >>> Can you check the result to confirm what “perf list” will list in this case ? If it’s going to have only one entry in perf list, does it mean there are two event codes for pmu_cache_miss and it can work with either of the event code ?
> > > > > >>>
> > > > > >>
> > > > > >> Sorry for the late reply.
> > > > > >> Yes, I've checked if there are duplicate pmu_cache_miss events in the
> > > > > >> JSON files, the perf list will have only one entry in perf list.
> > > > > >>
> > > > > >>> If it happens to be a mistake in json file to have duplicate entry with different event code (ex: with some broken commit), I am thinking if the better fix is to keep only the valid entry in json file ?
> > > > > >>>
> > > > > >>
> > > > > >> Yes, I agree we should fix the duplicate events in vendor JSON files.
> > > > > >>
> > > > > >> According to this code snippet [1], it seems the perf tool originally
> > > > > >> allowed duplicate events to exist and it will skip the duplicate
> > > > > >> events not shown on the perf list.
> > > > > >> However, after this commit e6ff1eed3584 ("perf pmu: Lazily add JSON
> > > > > >> events"),  if there are two duplicate events, it causes a segfault.
> > > > > >>
> > > > > >> Can I ask, do you have any suggestions? Thanks.
> > > > > >>
> > > > > >> [1] https://github.com/torvalds/linux/blob/master/tools/perf/util/pmus.c#L491
> > > > > >>
> > > > > >
> > > > > > Kindly ping.
> > > > > >
> > > > > > Can I ask, are there any more comments about this patch? Thanks.
> > > > > >
> > > > > Hi Eric,
> > > > >
> > > > > The functions there says alias and to skip duplicate alias. I am not sure if that is for events
> > > > >
> > > > > Namhyung, Ian, Arnaldo
> > > > > Any comments here ?
> > > >
> > > > So I was trying to reproduce the problem here before looking at the
> > > > patch, tried a simple:
> > > >
> > > > ⬢[acme@...lbox perf-tools-next]$ git diff
> > > > diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/cache.json b/tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> > > > index 2e93b7835b41442b..167a41b0309b7cfc 100644
> > > > --- a/tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> > > > +++ b/tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> > > > @@ -1,4 +1,13 @@
> > > >  [
> > > > +    {
> > > > +        "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.",
> > > > +        "Counter": "0,1,2,3",
> > > > +        "EventCode": "0x51",
> > > > +        "EventName": "L1D.REPLACEMENT",
> > > > +        "PublicDescription": "Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.",
> > > > +        "SampleAfterValue": "100003",
> > > > +        "UMask": "0x1"
> > > > +    },
> > > >      {
> > > >          "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.",
> > > >          "Counter": "0,1,2,3",
> > > > ⬢[acme@...lbox perf-tools-next]$ grep L1D.REPLACEMENT tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> > > >         "EventName": "L1D.REPLACEMENT",
> > > >         "EventName": "L1D.REPLACEMENT",
> > > > ⬢[acme@...lbox perf-tools-next]$
> > > >
> > > > I.e. duplicated that whole event definition:
> > > >
> > > > Did a make clean and a rebuild and:
> > > >
> > > > root@x1:/home/acme/git/pahole# perf list l1d.replacement
> > > >
> > > > List of pre-defined events (to be used in -e or -M):
> > > >
> > > >
> > > > cache:
> > > >   l1d.replacement
> > > >        [Counts the number of cache lines replaced in L1 data cache. Unit: cpu_core]
> > > > root@x1:/home/acme/git/pahole# perf list > /dev/null
> > > > root@x1:/home/acme/git/pahole#
> > > >
> > > > No crash, can you provide instructions on how to reproduce the problem?
> > > >
> > > > I would like to use the experience to add a 'perf test' to show this
> > > > failing and then after the patch it passing that new test.
> > > >
> > >
> > > The segfault occurs when the vendor JSON files contain two events with duplicate names.
> > >
> > > I tried adding two duplicate events "L1D.REPLACEMENT" and "L1D_PEND_MISS.FB_FULL"  and the issue can be reproduced on my laptop on the x86 platform.
> > >
> > > ericl@...cL-ThinkPadX1-TW 11:11 ~/linux_src/linux/tools/perf (master)
> > > git diff
> > > diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/cache.json b/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
> > > index f4144a1110be..71062a82699d 100644
> > > --- a/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
> > > +++ b/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
> > > @@ -8,6 +8,24 @@
> > >          "SampleAfterValue": "100003",
> > >          "UMask": "0x1"
> > >      },
> > > +    {
> > > +        "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.",
> > > +        "Counter": "0,1,2,3",
> > > +        "EventCode": "0x51",
> > > +        "EventName": "L1D.REPLACEMENT",
> > > +        "PublicDescription": "Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.",
> > > +        "SampleAfterValue": "100003",
> > > +        "UMask": "0x1"
> > > +    },
> > > +    {
> > > +        "BriefDescription": "Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability.",
> > > +        "Counter": "0,1,2,3",
> > > +        "EventCode": "0x48",
> > > +        "EventName": "L1D_PEND_MISS.FB_FULL",
> > > +        "PublicDescription": "Counts number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.",
> > > +        "SampleAfterValue": "1000003",
> > > +        "UMask": "0x2"
> > > +    },
> > >      {
> > >          "BriefDescription": "Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability.",
> > >          "Counter": "0,1,2,3",
> > >
> > > ericl@...cL-ThinkPadX1-TW 11:11 ~/linux_src/linux/tools/perf (master)
> > > $ ./perf list
> > > Segmentation fault (core dumped)
> >
> > Hi Eric,
> >
> > the series we were asking you to test was:
> > https://lore.kernel.org/linux-perf-users/20240805194424.597244-1-irogers@google.com/
> > It can be convenient to use the b4 tool to grab a set of patches. It
> > is also in the perf-tools-next tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/
> > in the tmp.perf-tools-next branch, so you could clone that.
> >
> > Having duplicate events doesn't make sense. When a sysfs event and
> > json event exist with the same name, the json events values override
> > those of the sysfs event:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=tmp.perf-tools-next#n599
> > Two json events doesn't have a clear meaning and which would be found
> > would be dependent on a binary search. To avoid the problem the linked
> > to series forbids duplicate events in the json and adds a build test
> > building all architectures json. As this would break due to duplicate
> > events, as your patch shows, there are json fixes for RISC-V and ARM.
> >
>
> I tested the series and added duplicate events on the JSON files
>
> diff --git a/tools/perf/pmu-events/arch/riscv/sifive/u74/memory.json b/tools/perf/pmu-events/arch/riscv/sifive/u74/memory.json
> index be1a46312ac3..48f5aec4875a 100644
> --- a/tools/perf/pmu-events/arch/riscv/sifive/u74/memory.json
> +++ b/tools/perf/pmu-events/arch/riscv/sifive/u74/memory.json
> @@ -1,4 +1,9 @@
>  [
> +  {
> +    "EventName": "ICACHE_RETIRED",
> +    "EventCode": "0x0000102",
> +    "BriefDescription": "Instruction cache miss"
> +  },
>    {
>      "EventName": "ICACHE_RETIRED",
>      "EventCode": "0x0000102",
> @@ -29,4 +34,4 @@
>      "EventCode": "0x0002002",
>      "BriefDescription": "UTLB miss"
>    }
>
> When building the perf tool, it will show the build error as follows:
>
>
>   CC      tests/pmu-events.o
>   CC      util/cacheline.o
> Traceback (most recent call last):
>   File "pmu-events/jevents.py", line 1313, in <module>
>     main()
>   File "pmu-events/jevents.py", line 1304, in main
>     ftw(arch_path, [], process_one_file)
>   File "pmu-events/jevents.py", line 1245, in ftw
>     ftw(item.path, parents + [item.name], action)
>   File "pmu-events/jevents.py", line 1243, in ftw
>     action(parents, item)
>   File "pmu-events/jevents.py", line 646, in process_one_file
>     print_pending_events()
>   File "pmu-events/jevents.py", line 510, in print_pending_events
>     assert event.name != last_name, f"Duplicate event: {last_pmu}/{last_name}/ in {_pending_events_tblname}"
> AssertionError: Duplicate event: default_core/icache_retired/ in pmu_events__sifive_u74
>   LD      arch/riscv/util/perf-util-in.o
> pmu-events/Build:35: recipe for target 'pmu-events/pmu-events.c' failed
> make[3]: *** [pmu-events/pmu-events.c] Error 1
> make[3]: *** Deleting file 'pmu-events/pmu-events.c'
> Makefile.perf:763: recipe for target 'pmu-events/pmu-events-in.o' failed
> make[2]: *** [pmu-events/pmu-events-in.o] Error 2
> make[2]: *** Waiting for unfinished jobs....
>   CC      bench/futex-lock-pi.o
>
> I think the patch series can detect if the vendor JSON file has duplicate events when building the perf tool. Thanks.
>
Hi Ian, Arnaldo

I tested the duplicate event with tmp.perf-tools-next branch from
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/

Thanks.

Tested-by: Eric Lin <eric.lin@...ive.com>

Best regards,
Eric Lin

>
> Best regards,
> Eric Lin
>
> > Thanks,
> > Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ