lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fXSiqQT+-mztdrV=Y6p+iPhZnmY9uqL08KrEe3bSybJxw@mail.gmail.com>
Date: Mon, 5 Aug 2024 21:00:44 -0700
From: Ian Rogers <irogers@...gle.com>
To: Eric Lin <eric.lin@...ive.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Athira Rajeev <atrajeev@...ux.vnet.ibm.com>, 
	Namhyung Kim <namhyung@...nel.org>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	James Clark <james.clark@....com>, linux-perf-users <linux-perf-users@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, vincent.chen@...ive.com, 
	greentime.hu@...ive.com, Samuel Holland <samuel.holland@...ive.com>
Subject: Re: [PATCH] perf pmus: Fix duplicate events caused segfault

On Mon, Aug 5, 2024 at 8:29 PM Eric Lin <eric.lin@...ive.com> wrote:
>
>
> Hi Arnaldo,
>
> On Mon, Aug 5, 2024 at 11:43 PM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
> >
> > On Mon, Aug 05, 2024 at 07:54:33PM +0530, Athira Rajeev wrote:
> > >
> > >
> > > > On 4 Aug 2024, at 8:36 PM, Eric Lin <eric.lin@...ive.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Sun, Jul 21, 2024 at 11:44 PM Eric Lin <eric.lin@...ive.com> wrote:
> > > >>
> > > >> Hi Athira,
> > > >>
> > > >> On Sat, Jul 20, 2024 at 4:35 PM Athira Rajeev
> > > >> <atrajeev@...ux.vnet.ibm.com> wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>>> On 19 Jul 2024, at 1:46 PM, Eric Lin <eric.lin@...ive.com> wrote:
> > > >>>>
> > > >>>> Currently, if vendor JSON files have two duplicate event names,
> > > >>>> the "perf list" command will trigger a segfault.
> > > >>>>
> > > >>>> In commit e6ff1eed3584 ("perf pmu: Lazily add JSON events"),
> > > >>>> pmu_events_table__num_events() gets the number of JSON events
> > > >>>> from table_pmu->num_entries, which includes duplicate events
> > > >>>> if there are duplicate event names in the JSON files.
> > > >>>
> > > >>> Hi Eric,
> > > >>>
> > > >>> Let us consider there are duplicate event names in the JSON files, say :
> > > >>>
> > > >>> metric.json with: EventName as pmu_cache_miss, EventCode as 0x1
> > > >>> cache.json with:  EventName as pmu_cache_miss, EventCode as 0x2
> > > >>>
> > > >>> If we fix the segfault and proceed, still “perf list” will list only one entry for pmu_cache_miss with may be 0x1/0x2 as event code ?
> > > >>> Can you check the result to confirm what “perf list” will list in this case ? If it’s going to have only one entry in perf list, does it mean there are two event codes for pmu_cache_miss and it can work with either of the event code ?
> > > >>>
> > > >>
> > > >> Sorry for the late reply.
> > > >> Yes, I've checked if there are duplicate pmu_cache_miss events in the
> > > >> JSON files, the perf list will have only one entry in perf list.
> > > >>
> > > >>> If it happens to be a mistake in json file to have duplicate entry with different event code (ex: with some broken commit), I am thinking if the better fix is to keep only the valid entry in json file ?
> > > >>>
> > > >>
> > > >> Yes, I agree we should fix the duplicate events in vendor JSON files.
> > > >>
> > > >> According to this code snippet [1], it seems the perf tool originally
> > > >> allowed duplicate events to exist and it will skip the duplicate
> > > >> events not shown on the perf list.
> > > >> However, after this commit e6ff1eed3584 ("perf pmu: Lazily add JSON
> > > >> events"),  if there are two duplicate events, it causes a segfault.
> > > >>
> > > >> Can I ask, do you have any suggestions? Thanks.
> > > >>
> > > >> [1] https://github.com/torvalds/linux/blob/master/tools/perf/util/pmus.c#L491
> > > >>
> > > >
> > > > Kindly ping.
> > > >
> > > > Can I ask, are there any more comments about this patch? Thanks.
> > > >
> > > Hi Eric,
> > >
> > > The functions there says alias and to skip duplicate alias. I am not sure if that is for events
> > >
> > > Namhyung, Ian, Arnaldo
> > > Any comments here ?
> >
> > So I was trying to reproduce the problem here before looking at the
> > patch, tried a simple:
> >
> > ⬢[acme@...lbox perf-tools-next]$ git diff
> > diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/cache.json b/tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> > index 2e93b7835b41442b..167a41b0309b7cfc 100644
> > --- a/tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> > +++ b/tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> > @@ -1,4 +1,13 @@
> >  [
> > +    {
> > +        "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.",
> > +        "Counter": "0,1,2,3",
> > +        "EventCode": "0x51",
> > +        "EventName": "L1D.REPLACEMENT",
> > +        "PublicDescription": "Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.",
> > +        "SampleAfterValue": "100003",
> > +        "UMask": "0x1"
> > +    },
> >      {
> >          "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.",
> >          "Counter": "0,1,2,3",
> > ⬢[acme@...lbox perf-tools-next]$ grep L1D.REPLACEMENT tools/perf/pmu-events/arch/x86/rocketlake/cache.json
> >         "EventName": "L1D.REPLACEMENT",
> >         "EventName": "L1D.REPLACEMENT",
> > ⬢[acme@...lbox perf-tools-next]$
> >
> > I.e. duplicated that whole event definition:
> >
> > Did a make clean and a rebuild and:
> >
> > root@x1:/home/acme/git/pahole# perf list l1d.replacement
> >
> > List of pre-defined events (to be used in -e or -M):
> >
> >
> > cache:
> >   l1d.replacement
> >        [Counts the number of cache lines replaced in L1 data cache. Unit: cpu_core]
> > root@x1:/home/acme/git/pahole# perf list > /dev/null
> > root@x1:/home/acme/git/pahole#
> >
> > No crash, can you provide instructions on how to reproduce the problem?
> >
> > I would like to use the experience to add a 'perf test' to show this
> > failing and then after the patch it passing that new test.
> >
>
> The segfault occurs when the vendor JSON files contain two events with duplicate names.
>
> I tried adding two duplicate events "L1D.REPLACEMENT" and "L1D_PEND_MISS.FB_FULL"  and the issue can be reproduced on my laptop on the x86 platform.
>
> ericl@...cL-ThinkPadX1-TW 11:11 ~/linux_src/linux/tools/perf (master)
> git diff
> diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/cache.json b/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
> index f4144a1110be..71062a82699d 100644
> --- a/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
> +++ b/tools/perf/pmu-events/arch/x86/tigerlake/cache.json
> @@ -8,6 +8,24 @@
>          "SampleAfterValue": "100003",
>          "UMask": "0x1"
>      },
> +    {
> +        "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.",
> +        "Counter": "0,1,2,3",
> +        "EventCode": "0x51",
> +        "EventName": "L1D.REPLACEMENT",
> +        "PublicDescription": "Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.",
> +        "SampleAfterValue": "100003",
> +        "UMask": "0x1"
> +    },
> +    {
> +        "BriefDescription": "Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability.",
> +        "Counter": "0,1,2,3",
> +        "EventCode": "0x48",
> +        "EventName": "L1D_PEND_MISS.FB_FULL",
> +        "PublicDescription": "Counts number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.",
> +        "SampleAfterValue": "1000003",
> +        "UMask": "0x2"
> +    },
>      {
>          "BriefDescription": "Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability.",
>          "Counter": "0,1,2,3",
>
> ericl@...cL-ThinkPadX1-TW 11:11 ~/linux_src/linux/tools/perf (master)
> $ ./perf list
> Segmentation fault (core dumped)

Hi Eric,

the series we were asking you to test was:
https://lore.kernel.org/linux-perf-users/20240805194424.597244-1-irogers@google.com/
It can be convenient to use the b4 tool to grab a set of patches. It
is also in the perf-tools-next tree:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/
in the tmp.perf-tools-next branch, so you could clone that.

Having duplicate events doesn't make sense. When a sysfs event and
json event exist with the same name, the json events values override
those of the sysfs event:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=tmp.perf-tools-next#n599
Two json events doesn't have a clear meaning and which would be found
would be dependent on a binary search. To avoid the problem the linked
to series forbids duplicate events in the json and adds a build test
building all architectures json. As this would break due to duplicate
events, as your patch shows, there are json fixes for RISC-V and ARM.

Thanks,
Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ