lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2F311A4E-7D25-44EA-89CC-7228F1ABA18F@nutanix.com>
Date: Thu, 22 Aug 2024 15:37:23 +0000
From: Jon Kohler <jon@...anix.com>
To: Ian Rogers <irogers@...gle.com>
CC: "adrian.hunter@...el.com" <adrian.hunter@...el.com>,
        "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
        LKML
	<linux-kernel@...r.kernel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>
Subject: Re: Perf test failures for 10.2 PMU event map aliases



> On Aug 22, 2024, at 11:15 AM, Ian Rogers <irogers@...gle.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On Thu, Aug 22, 2024 at 8:01 AM Jon Kohler <jon@...anix.com> wrote:
>> 
>> 
>> 
>>> On Aug 20, 2024, at 11:17 AM, Ian Rogers <irogers@...gle.com> wrote:
>>> 
>>> !-------------------------------------------------------------------|
>>> CAUTION: External Email
>>> 
>>> |-------------------------------------------------------------------!
>>> 
>>> On Tue, Aug 20, 2024 at 6:54 AM Jon Kohler <jon@...anix.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 20, 2024, at 1:41 AM, Ian Rogers <irogers@...gle.com> wrote:
>>>>> 
>>>>> !-------------------------------------------------------------------|
>>>>> CAUTION: External Email
>>>>> 
>>>>> |-------------------------------------------------------------------!
>>>>> 
>>>>> On Mon, Aug 19, 2024 at 7:06 PM Jon Kohler <jon@...anix.com> wrote:
>>>>>> 
>>>>>> Reaching out to the perf community for feedback on the following
>>>>>> observed test failure. On 6.6.y, I see persistent failures with test
>>>>>> 10.2 PMU event map aliases, complaining about testing aliases uncore
>>>>>> PMU mismatches. I've included two outputs below, one with a bit of
>>>>>> hacky print debugging.
>>>>>> 
>>>>>> Using Intel(R) Xeon(R) Gold 6154 CPU:
>>>>>>      10.2: PMU event map aliases                                         :
>>>>>>      --- start ---
>>>>>>      test child forked, pid 962901
>>>>>>      Using CPUID GenuineIntel-6-55-4
>>>>> 
>>>>> Hi Jon,
>>>>> 
>>>>> Sorry for the brief reply but I thought some quick hints might unblock
>>>>> you on this. The CPUID lines up with a SkylakeX:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n33&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=F-eXsmTASgRsptt5Gahro6fRyMwEQdjZ6PtY7vhzIKM&e=
>>>>> 
>>>>>>      testing core PMU cpu aliases: pass
>>>>>>      JKDBG: pmu nr total 3 pmu->sysfs_aliases 3 pmu->sys_json_aliases 0
>>>>>>      JKDBG: pmu cpu_aliases_added nr total 4 pmu->cpu_json_aliases 1
>>>>>>      testing aliases uncore PMU uncore_imc_0: mismatch expected aliases
>>>>>>        (1) vs found (4)
>>>>>>      test child finished with -1
>>>>>>      ---- end ----
>>>>>>      PMU events subtest 2: FAILED!
>>>>>> 
>>>>>> Using Intel(R) Xeon(R) Platinum 8352Y:
>>>>>>      10.2: PMU event map aliases                                         :
>>>>>>      --- start ---
>>>>>>      test child forked, pid 1765070
>>>>>>      Using CPUID GenuineIntel-6-6A-6
>>>>> 
>>>>> This is an IcelakeX:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_mapfile.csv-3Fh-3Dperf-2Dtools-2Dnext-23n18&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=6DwD4ZmywAtcwCnRjx7wRfmdW_G65wHIuyZJIc__2yc&e=
>>>>> 
>>>>>>      testing core PMU cpu aliases: pass
>>>>>>      testing aliases uncore PMU uncore_imc_free_running_0: mismatch
>>>>>>        expected aliases (1) vs found (6)
>>>>>>      test child finished with -1
>>>>>>      ---- end ----
>>>>>>      PMU events subtest 2: FAILED!
>>>>>> 
>>>>>> Digging in more, looking at pmu_aliases_parse, I see that we'll discard
>>>>>> scale and unit files in pmu_alias_info_file, which leaves us with 3x
>>>>>> aliases in the uncore_imc_0 in the first case and 5x aliases in the
>>>>>> uncore_imc_free_running_0 second case.
>>>>>> 
>>>>>> # From 6154-based system:
>>>>>> ls -lhat /sys/devices/uncore_imc_0/events
>>>>> 
>>>>> The "uncore_" prefix and the "_0" suffix are optional, the naming
>>>>> matching is case insensitive. In the event json the events are listed
>>>>> here:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_skylakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=FpAgVwLmTyXUFQIMZ_gbPlH9aXvrmcJ8CZaW3tKIaj4&e=
>>>>> 
>>>>>> total 0
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_read.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 18:50 cas_count_write.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug  9 15:30 cas_count_read
>>>>>> -r--r--r--. 1 root root 4.0K Aug  9 15:30 cas_count_write
>>>>>> -r--r--r--. 1 root root 4.0K Aug  9 15:30 clockticks
>>>>> 
>>>>> This should be 3 sysfs events (I don't like the term alias), note that
>>>>> we load the sysfs and json events lazily to avoid overhead.
>>>>> 
>>>>>> drwxr-xr-x. 2 root root    0 Jul 17 03:40 .
>>>>>> drwxr-xr-x. 5 root root    0 Jul 17 02:52 ..
>>>>>> 
>>>>>> # From the 8352Y-based system:
>>>>>> ls -lhat /sys/bus/event_source/devices/uncore_imc_free_running_0/events
>>>>>> total 0
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_read.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 ddrt_write.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 read.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.scale
>>>>>> -r--r--r--. 1 root root 4.0K Aug 20 01:44 write.unit
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 dclk
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_read
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 ddrt_write
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 read
>>>>>> -r--r--r--. 1 root root 4.0K Aug 19 21:33 write
>>>>> 
>>>>> This is 5 sysfs events, the json events are here:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_x86_icelakex_uncore-2Dmemory.json-3Fh-3Dperf-2Dtools-2Dnext-23n134&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=RJx661xzakrB42hsUsFD1HhJczkgpaYur9lHVtl7j36__CBOqYfKf4Dnq0xdpBZl&s=MrHuUCZFqrNrd05IPyq4fuZDH4_owkEw0xHcc7bvGvU&e=
>>>>> Note, the "Unit", meaning the PMU should be imc_free_running to match
>>>>> this device.
>>>>> 
>>>>>> drwxr-xr-x. 2 root root    0 Aug 15 03:15 .
>>>>>> drwxr-xr-x. 5 root root    0 Aug 15 02:42 ..
>>>>>> 
>>>>>> Looking at the structure of __test_uncore_pmu_event_aliases, however,
>>>>>> I'm not quite sure how this is supposed to work. I've annotated a walk
>>>>>> through below to highlight where things are going off the rails.
>>>>>> 
>>>>>> static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>>>>> {
>>>>>> ...
>>>>>>      /* Count how many aliases we generated */
>>>>>>      alias_count = perf_pmu__num_events(pmu);
>>>>>>              // alias_count == 4 in the 6154-based system
>>>>>>              // alias_count == 6 in the 8352Y-based system
>>>>>> 
>>>>>>      /* Count how many aliases we expect from the known table */
>>>>>>      for (table = &test_pmu->aliases[0]; *table; table++)
>>>>>>              to_match_count++;
>>>>>>                      // this is looking at aliases in struct perf_pmu_test_pmu
>>>>>>                      // table, which for uncore_imc_0 is a single entry for
>>>>>>                      // &uncore_imc_cache_hits.
>>>>>>                      //
>>>>>>                      // for the 8352Y case, likewise, we only have a single alias
>>>>>>                      // in the table for &uncore_imc_free_running_cache_miss.
>>>>>>                      //
>>>>>>                      // in both cases, to_match_count == 1
>>>>>> 
>>>>>>      // Compare 4 vs 1 or 6 vs 1
>>>>>>      if (alias_count != to_match_count) {
>>>>>>              pr_debug("testing aliases uncore PMU %s: mismatch expected aliases (%d) vs found (%d)\n",
>>>>>>                       pmu_name, to_match_count /* 1 */, alias_count /* 4 */);
>>>>>>              return -1;
>>>>>>                      // we seemed doomed to hit this conditional always, no?
>>>>>>      }
>>>>>> ...
>>>>>> }
>>>>>> 
>>>>>> I did a walkthrough of the latest mainline code, and don't see a marked
>>>>>> difference that jump off the page to me that'd correct this behavior,
>>>>>> and would love a helping hand to point in the right direction on this.
>>>>>> 
>>>>>> What am I missing here?
>>>>> 
>>>>> I'll need some more time to dig into this. Hopefully the pointers above help.
>>>> 
>>>> Thanks for the quick reply and pointers, I appreciate it. The tricky bit still
>>>> remains, as the code I posted to above seems to solely depend on the
>>>> info filled into struct perf_pmu_test_pmu, right? If so, I don’t see how the
>>>> dots connect from this test to the other events in sysfs/json’s.
>>> 
>>> So looking at the test it is using the testcpu:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_tests_pmu-2Devents.c-3Fh-3Dperf-2Dtools-2Dnext-23n602&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=erGg8kUByjl_j5R0D0PxRZjTZhvazxwC9KW8rOT9Pp4&e=
>>> the json for that is here:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu-3Fh-3Dperf-2Dtools-2Dnext&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=upMgwNSdGAw5sdDUTdoyvXhLy4KhFUYqPdxZKx8Ov-ZxDYERFVy8PU040wwDAYPp&s=z535_TbF_oJLjEoRuhbbqzB9Xo5MwWWmOcP0pgMulWY&e=
>>> The names in the test are based on ones seen on real CPUs, so this may
>>> be leading to the confusion.
>> 
>> Hey Ian,
>> I was able to debug this a bit more. The following diff fixes this test on my system.
>> 
>> Even though we were supposed to be using the test data only, the sysfs entries
>> from my systems, which happened to have similar names, threw a wrench in
>> this test.
>> 
>> With this diff, we just use the JSON aliases that were added.
>> 
>> Happy to send this out as a formal patch, but wanted to get the list’s 2cents
>> first, as I feel like I’m missing something :)
>> 
>> Jon
>> 
>> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
>> index f5321fbdee79..893dc7afee76 100644
>> --- a/tools/perf/tests/pmu-events.c
>> +++ b/tools/perf/tests/pmu-events.c
>> @@ -584,6 +584,9 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>        const struct pmu_events_table *events_table;
>>        int res = 0;
>> 
>> +       /* CPU events come from struct pmu_event pmu_events__test_soc_cpu
>> +        * and sys events come from struct pmu_event pmu_events__test_soc_sys
>> +        */
>>        events_table = find_core_events_table("testarch", "testcpu");
>>        if (!events_table)
>>                return -1;
>> @@ -593,10 +596,14 @@ static int __test_uncore_pmu_event_aliases(struct perf_pmu_test_pmu *test_pmu)
>>        pmu->sysfs_aliases_loaded = true;
>>        pmu_add_sys_aliases(pmu);
>> 
>> -       /* Count how many aliases we generated */
>> -       alias_count = perf_pmu__num_events(pmu);
>> +       /* How many events we gathered for this PMU in test_soc.
>> +        * Note: we specifically do not use perf_pmu__num_events as that may
>> +        * include spurious system events from the system under test, which
>> +        * may have similarly named PMUs.
> 
> Thanks Jon, should we just rename the PMUs in the test json files? For
> example, rather than "CBO" here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_perf_perf-2Dtools-2Dnext.git_tree_tools_perf_pmu-2Devents_arch_test_test-5Fsoc_cpu_uncore.json-3Fh-3Dperf-2Dtools-2Dnext-23n10&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=ColPGeQKTJvgSki3uEmVftry27ANS1v996w_qYNFC9oJe3CApdQ44in4Xn-DEJ-f&s=taxB1JgBs5_gfxc_Jo9emKkuiP70MY1hhm5KrSozfXQ&e= 
> we can have "test_pmu1".

I think that could be a separate cleanup, sure, though I don’t
think that is super pressing. But certainly having it be different
names made this quite confusing to debug and grep thru the
source. I had to resort to the tried-n-true “just printk everything” 
to figure out how this all tied together :) 

I’ll send out my diff as a separate thread to the list in the mean
time.

Jon

> 
> Thanks for investigating this!
> Ian
> 
>> +        */
>> +       alias_count = pmu->cpu_json_aliases + pmu->sys_json_aliases;
>> 
>> -       /* Count how many aliases we expect from the known table */
>> +       /* How many aliases we expect from struct perf_pmu_test_pmu test_pmus */
>>        for (table = &test_pmu->aliases[0]; *table; table++)
>>                to_match_count++;
>> 
>>> 
>>> Thanks,
>>> Ian
>> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ