[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <05E55194-C265-4BDA-911D-B9E57EED3CBB@nutanix.com>
Date: Tue, 20 Aug 2024 02:06:02 +0000
From: Jon Kohler <jon@...anix.com>
To: "irogers@...gle.com" <irogers@...gle.com>,
"adrian.hunter@...el.com"
<adrian.hunter@...el.com>,
"linux-perf-users@...r.kernel.org"
<linux-perf-users@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Kan
Liang <kan.liang@...ux.intel.com>,
"alexander.shishkin@...ux.intel.com"
<alexander.shishkin@...ux.intel.com>
Subject: Perf test failures for 10.2 PMU event map aliases
Reaching out to the perf community for feedback on the following
observed test failure. On 6.6.y, I see persistent failures with test
10.2 PMU event map aliases, complaining about testing aliases uncore
PMU mismatches. I've included two outputs below, one with a bit of
hacky print debugging.
Using Intel(R) Xeon(R) Gold 6154 CPU:
10.2: PMU event map aliases :
--- start ---
test child forked, pid 962901
Using CPUID GenuineIntel-6-55-4
testing core PMU cpu aliases: pass
JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
(1) vs found (4)
test child finished with -1
---- end ----
PMU events subtest 2: FAILED!
Using Intel(R) Xeon(R) Platinum 8352Y:
10.2: PMU event map aliases :
--- start ---
test child forked, pid 1765070
Using CPUID GenuineIntel-6-6A-6
testing core PMU cpu aliases: pass
testing aliases uncore PMU uncore_imc_free_running_0: mismatch
expected aliases (1) vs found (6)
test child finished with -1
---- end ----
PMU events subtest 2: FAILED!
Digging in more, looking at pmu_aliases_parse, I see that we'll discard
scale and unit files in pmu_alias_info_file, which leaves us with 3x
aliases in the uncore_imc_0 in the first case and 5x aliases in the
uncore_imc_free_running_0 second case.
# From 6154-based system:
ls -lhat /sys/devices/uncore_imc_0/events
total 0
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
-r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
-r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_read
-r--r--r--. 1 root root 4.0K Aug 9 15:30 cas_count_write
-r--r--r--. 1 root root 4.0K Aug 9 15:30 clockticks
drwxr-xr-x. 2 root root 0 Jul 17 03:40 .
drwxr-xr-x. 5 root root 0 Jul 17 02:52 ..
# From the 8352Y-based system:
ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
total 0
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
-r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
-r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
-r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
-r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
-r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
-r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
-r--r--r--. 1 root root 4.0K Aug 19 21:33 read
-r--r--r--. 1 root root 4.0K Aug 19 21:33 write
drwxr-xr-x. 2 root root 0 Aug 15 03:15 .
drwxr-xr-x. 5 root root 0 Aug 15 02:42 ..
Looking at the structure of __test_uncore_pmu_event_aliases, however,
I'm not quite sure how this is supposed to work. I've annotated a walk
through below to highlight where things are going off the rails.
static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
{
...
/* Count how many aliases we generated */
alias_count = perf_pmu__num_events(pmu);
// alias_count == 4 in the 6154-based system
// alias_count == 6 in the 8352Y-based system
/* Count how many aliases we expect from the known table */
for (table = &test_pmu->aliases[0]; *table; table++)
to_match_count++;
// this is looking at aliases in struct perf_pmu_test_pmu
// table, which for uncore_imc_0 is a single entry for
// &uncore_imc_cache_hits.
//
// for the 8352Y case, likewise, we only have a single alias
// in the table for &uncore_imc_free_running_cache_miss.
//
// in both cases, to_match_count == 1
// Compare 4 vs 1 or 6 vs 1
if (alias_count != to_match_count) {
pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
return -1;
// we seemed doomed to hit this conditional always, no?
}
...
}
I did a walkthrough of the latest mainline code, and don't see a marked
difference that jump off the page to me that'd correct this behavior,
and would love a helping hand to point in the right direction on this.
What am I missing here?
Thanks all,
Jon
Powered by blists - more mailing lists