lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUndbBhSLb35_bL-+Xu3erB6ssx-sAEYaf7mgxPawNEbA@mail.gmail.com>
Date: Mon, 19 Aug 2024 22:41:49 -0700
From: Ian Rogers <irogers@...gle.com>
To: Jon Kohler <jon@...anix.com>
Cc: "adrian.hunter@...el.com" <adrian.hunter@...el.com>, 
	"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Kan Liang <kan.liang@...ux.intel.com>, 
	"alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>
Subject: Re: Perf test failures for 10.2 PMU event map aliases

On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@...anix.com> wrote:
>
> Reaching out to the perf community for feedback on the following
> observed test failure. On 6.6.y, I see persistent failures with test
> 10.2 PMU event map aliases, complaining about testing aliases uncore
> PMU mismatches. I've included two outputs below, one with a bit of
> hacky print debugging.
>
> Using Intel(R) Xeon(R) Gold 6154 CPU:
>         10.2: PMU event map aliases                                         :
>         --- start ---
>         test child forked, pid 962901
>         Using CPUID GenuineIntel-6-55-4

Hi Jon,

Sorry for the brief reply but I thought some quick hints might unblock
you on this. The CPUID lines up with a SkylakeX:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/mapfile.csv?h=perf-tools-next#n33

>         testing core PMU cpu aliases: pass
>         JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
>         JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
>         testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
>           (1) vs found (4)
>         test child finished with -1
>         ---- end ----
>         PMU events subtest 2: FAILED!
>
> Using Intel(R) Xeon(R) Platinum 8352Y:
>         10.2: PMU event map aliases                                         :
>         --- start ---
>         test child forked, pid 1765070
>         Using CPUID GenuineIntel-6-6A-6

This is an IcelakeX:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/mapfile.csv?h=perf-tools-next#n18

>         testing core PMU cpu aliases: pass
>         testing aliases uncore PMU uncore_imc_free_running_0: mismatch
>           expected aliases (1) vs found (6)
>         test child finished with -1
>         ---- end ----
>         PMU events subtest 2: FAILED!
>
> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
> scale and unit files in pmu_alias_info_file, which leaves us with 3x
> aliases in the uncore_imc_0 in the first case and 5x aliases in the
> uncore_imc_free_running_0 second case.
>
> # From 6154-based system:
> ls -lhat /sys/devices/uncore_imc_0/events

The "uncore_" prefix and the "_0" suffix are optional, the naming
matching is case insensitive. In the event json the events are listed
here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json?h=perf-tools-next

> total 0
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
> -r--r--r--. 1 root root 4.0K Aug  9 15:30 cas_count_read
> -r--r--r--. 1 root root 4.0K Aug  9 15:30 cas_count_write
> -r--r--r--. 1 root root 4.0K Aug  9 15:30 clockticks

This should be 3 sysfs events (I don't like the term alias), note that
we load the sysfs and json events lazily to avoid overhead.

> drwxr-xr-x. 2 root root    0 Jul 17 03:40 .
> drwxr-xr-x. 5 root root    0 Jul 17 02:52 ..
>
> # From the 8352Y-based system:
> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
> total 0
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write

This is 5 sysfs events, the json events are here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json?h=perf-tools-next#n134
Note, the "Unit", meaning the PMU should be imc_free_running to match
this device.

> drwxr-xr-x. 2 root root    0 Aug 15 03:15 .
> drwxr-xr-x. 5 root root    0 Aug 15 02:42 ..
>
> Looking at the structure of __test_uncore_pmu_event_aliases, however,
> I'm not quite sure how this is supposed to work. I've annotated a walk
> through below to highlight where things are going off the rails.
>
> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
> {
> ...
>         /* Count how many aliases we generated */
>         alias_count = perf_pmu__num_events(pmu);
>                 // alias_count == 4 in the 6154-based system
>                 // alias_count == 6 in the 8352Y-based system
>
>         /* Count how many aliases we expect from the known table */
>         for (table = &test_pmu->aliases[0]; *table; table++)
>                 to_match_count++;
>                         // this is looking at aliases in struct perf_pmu_test_pmu
>                         // table, which for uncore_imc_0 is a single entry for
>                         // &uncore_imc_cache_hits.
>                         //
>                         // for the 8352Y case, likewise, we only have a single alias
>                         // in the table for &uncore_imc_free_running_cache_miss.
>                         //
>                         // in both cases, to_match_count == 1
>
>         // Compare 4 vs 1 or 6 vs 1
>         if (alias_count != to_match_count) {
>                 pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
>                          pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
>                 return -1;
>                         // we seemed doomed to hit this conditional always, no?
>         }
> ...
> }
>
> I did a walkthrough of the latest mainline code, and don't see a marked
> difference that jump off the page to me that'd correct this behavior,
> and would love a helping hand to point in the right direction on this.
>
> What am I missing here?

I'll need some more time to dig into this. Hopefully the pointers above help.

Thanks,
Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ