lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <067B5698-9C34-4283-888B-DC691F3462A8@nutanix.com>
Date: Thu, 22 Aug 2024 15:01:34 +0000
From: Jon Kohler <jon@...anix.com>
To: Ian Rogers <irogers@...gle.com>
CC: "adrian.hunter@...el.com" <adrian.hunter@...el.com>,
        "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
        LKML
	<linux-kernel@...r.kernel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>
Subject: Re: Perf test failures for 10.2 PMU event map aliases



> On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@...gle.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@...anix.com> wrote:
>> 
>> 
>> 
>>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@...gle.com> wrote:
>>> 
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>> 
>>> |-------------------------------------------------------------------!
>>> 
>>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@...anix.com> wrote:
>>>> 
>>>> Reaching out to the perf community for feedback on the following
>>>> observed test failure. On 6.6.y, I see persistent failures with test
>>>> 10.2 PMU event map aliases, complaining about testing aliases uncore
>>>> PMU mismatches. I've included two outputs below, one with a bit of
>>>> hacky print debugging.
>>>> 
>>>> Using Intel(R) Xeon(R) Gold 6154 CPU:
>>>>       10.2: PMU event map aliases                                         :
>>>>       --- start ---
>>>>       test child forked, pid 962901
>>>>       Using CPUID GenuineIntel-6-55-4
>>> 
>>> Hi Jon,
>>> 
>>> Sorry for the brief reply but I thought some quick hints might unblock
>>> you on this. The CPUID lines up with a SkylakeX:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e=
>>> 
>>>>       testing core PMU cpu aliases: pass
>>>>       JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
>>>>       JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
>>>>       testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
>>>>         (1) vs found (4)
>>>>       test child finished with -1
>>>>       ---- end ----
>>>>       PMU events subtest 2: FAILED!
>>>> 
>>>> Using Intel(R) Xeon(R) Platinum 8352Y:
>>>>       10.2: PMU event map aliases                                         :
>>>>       --- start ---
>>>>       test child forked, pid 1765070
>>>>       Using CPUID GenuineIntel-6-6A-6
>>> 
>>> This is an IcelakeX:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e=
>>> 
>>>>       testing core PMU cpu aliases: pass
>>>>       testing aliases uncore PMU uncore_imc_free_running_0: mismatch
>>>>         expected aliases (1) vs found (6)
>>>>       test child finished with -1
>>>>       ---- end ----
>>>>       PMU events subtest 2: FAILED!
>>>> 
>>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
>>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x
>>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the
>>>> uncore_imc_free_running_0 second case.
>>>> 
>>>> # From 6154-based system:
>>>> ls -lhat /sys/devices/uncore_imc_0/events
>>> 
>>> The "uncore_" prefix and the "_0" suffix are optional, the naming
>>> matching is case insensitive. In the event json the events are listed
>>> here:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e=
>>> 
>>>> total 0
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
>>>> -r--r--r--. 1 root root 4.0K Aug  9 15:30 cas_count_read
>>>> -r--r--r--. 1 root root 4.0K Aug  9 15:30 cas_count_write
>>>> -r--r--r--. 1 root root 4.0K Aug  9 15:30 clockticks
>>> 
>>> This should be 3 sysfs events (I don't like the term alias), note that
>>> we load the sysfs and json events lazily to avoid overhead.
>>> 
>>>> drwxr-xr-x. 2 root root    0 Jul 17 03:40 .
>>>> drwxr-xr-x. 5 root root    0 Jul 17 02:52 ..
>>>> 
>>>> # From the 8352Y-based system:
>>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
>>>> total 0
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
>>> 
>>> This is 5 sysfs events, the json events are here:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e=
>>> Note, the "Unit", meaning the PMU should be imc_free_running to match
>>> this device.
>>> 
>>>> drwxr-xr-x. 2 root root    0 Aug 15 03:15 .
>>>> drwxr-xr-x. 5 root root    0 Aug 15 02:42 ..
>>>> 
>>>> Looking at the structure of __test_uncore_pmu_event_aliases, however,
>>>> I'm not quite sure how this is supposed to work. I've annotated a walk
>>>> through below to highlight where things are going off the rails.
>>>> 
>>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>>> {
>>>> ...
>>>>       /* Count how many aliases we generated */
>>>>       alias_count = perf_pmu__num_events(pmu);
>>>>               // alias_count == 4 in the 6154-based system
>>>>               // alias_count == 6 in the 8352Y-based system
>>>> 
>>>>       /* Count how many aliases we expect from the known table */
>>>>       for (table = &test_pmu->aliases[0]; *table; table++)
>>>>               to_match_count++;
>>>>                       // this is looking at aliases in struct perf_pmu_test_pmu
>>>>                       // table, which for uncore_imc_0 is a single entry for
>>>>                       // &uncore_imc_cache_hits.
>>>>                       //
>>>>                       // for the 8352Y case, likewise, we only have a single alias
>>>>                       // in the table for &uncore_imc_free_running_cache_miss.
>>>>                       //
>>>>                       // in both cases, to_match_count == 1
>>>> 
>>>>       // Compare 4 vs 1 or 6 vs 1
>>>>       if (alias_count != to_match_count) {
>>>>               pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
>>>>                        pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
>>>>               return -1;
>>>>                       // we seemed doomed to hit this conditional always, no?
>>>>       }
>>>> ...
>>>> }
>>>> 
>>>> I did a walkthrough of the latest mainline code, and don't see a marked
>>>> difference that jump off the page to me that'd correct this behavior,
>>>> and would love a helping hand to point in the right direction on this.
>>>> 
>>>> What am I missing here?
>>> 
>>> I'll need some more time to dig into this. Hopefully the pointers above help.
>> 
>> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still
>> remains, as the code I posted to above seems to solely depend on the
>> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the
>> dots connect from this test to the other events in sysfs/json’s.
> 
> So looking at the test it is using the testcpu:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e= 
> the json for that is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e= 
> The names in the test are based on ones seen on real CPUs, so this may
> be leading to the confusion.

Hey Ian,
I was able to debug this a bit more. The following diff fixes this test on my system.

Even though we were supposed to be using the test data only, the sysfs entries
from my systems, which happened to have similar names, threw a wrench in
this test.

With this diff, we just use the JSON aliases that were added.

Happy to send this out as a formal patch, but wanted to get the list’s 2cents
first, as I feel like I’m missing something :) 

Jon

diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index f5321fbdee79..893dc7afee76 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
        const struct pmu_events_table *events_table;
        int res = 0;
 
+       /* CPU events come from struct pmu_event pmu_events__test_soc_cpu
+        * and sys events come from struct pmu_event pmu_events__test_soc_sys
+        */
        events_table = find_core_events_table("testarch", "testcpu");
        if (!events_table)
                return -1;
@@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
        pmu->sysfs_aliases_loaded = true;
        pmu_add_sys_aliases(pmu);
 
-       /* Count how many aliases we generated */
-       alias_count = perf_pmu__num_events(pmu);
+       /* How many events we gathered for this PMU in test_soc.
+        * Note: we specifically do not use perf_pmu__num_events as that may
+        * include spurious system events from the system under test, which
+        * may have similarly named PMUs.
+        */
+       alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases;
 
-       /* Count how many aliases we expect from the known table */
+       /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */
        for (table = &test_pmu->aliases[0]; *table; table++)
                to_match_count++;

> 
> Thanks,
> Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ