linux-kernel - Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <df8d710c-2543-520e-fe82-dbc8b2a47950@linux.intel.com>
Date:   Wed, 1 Feb 2023 14:06:54 -0500
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Ian Rogers <irogers@...gle.com>
Cc:     sedat.dilek@...il.com, "Xing, Zhengjun" <zhengjun.xing@...el.com>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Nathan Chancellor <natechancellor@...il.com>,
        llvm@...ts.linux.dev, Ben Hutchings <benh@...ian.org>,
        James Clark <james.clark@....com>,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED!



On 2023-02-01 12:02 p.m., Ian Rogers wrote:
> On Wed, Feb 1, 2023 at 7:28 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>>
>> Hi Ian,
>>
>> On 2023-01-30 10:55 p.m., Ian Rogers wrote:
>>>>> There's a question about what we should do in the perf test about
>>>>> this? I have a few solutions:
>>>>>
>>>>> 1) try metric tests again with the --metric-no-group flag and don't
>>>>> fail the test if this succeeds. This allows kernel bugs to hide, so
>>>>> I'm not a huge fan.
>>>>>
>>>>> 2) add a new metric flag/constraint to say not to group, this way the
>>>>> metric will automatically apply the "--metric-no-group" flag. It is a
>>>>> bit of work to wire this up but this kind of failure is common enough
>>>>> in PMUs that it is probably worthwhile. We also need to add the flag
>>>>> to metrics and I'm not sure how to get a good list of the metrics that
>>>>> currently fail and require it. This is okay but error prone.
>>>>>
>>>>> 3) fix the kernel bug and let the perf test fail until an adequate
>>>>> kernel is installed. Probably the best option.
>>>>>
>>>> Hi Ian,
>>>>
>>>> I can confirm:
>>>>
>>>> $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
>>>> /proc/sys/kernel/perf_event_paranoid
>>>> 0
>>>>
>>>> $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1
>>>>
>>>> Performance counter stats for 'system wide':
>>>>
>>>>         2.058.892      MEM_LOAD_UOPS_RETIRED.LLC_HIT    #      1,5 %
>>>> tma_l3_bound             (99,30%)
>>>>       173.254.697      CYCLE_ACTIVITY.STALLS_L2_PENDING
>>>>                         (99,10%)
>>>>     2.396.130.501      CPU_CLK_UNHALTED.THREAD
>>>>                         (99,60%)
>>>>         1.110.486      MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS
>>>>                            (99,53%)
>>>>
>>>>       1,001989022 seconds time elapsed
>>>>
>>>> $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1
>>>>
>>>> Performance counter stats for 'system wide':
>>>>
>>>>         1.729.208      MEM_LOAD_UOPS_RETIRED.LLC_HIT    #      1,2 %
>>>> tma_dram_bound           (99,50%)
>>>>        50.346.734      CYCLE_ACTIVITY.STALLS_L2_PENDING
>>>>                         (99,50%)
>>>>     2.354.963.862      CPU_CLK_UNHALTED.THREAD
>>>>                         (99,80%)
>>>>           306.500      MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS
>>>>                            (99,61%)
>>>>
>>>>       1,001981392 seconds time elapsed
>>>>
>>>> Thanks!
>>> Thanks, apparently it is an issue with SandyBridge/IvyBridge that some
>>> counters on one hyperthread will limit what can be on the other. I
>>> believe that's the comment related to EXCL access here:
>>> https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124
>>> So you may have more success with the metric if you disable
>>> hyperthreading, but I imagine that's not a popular option.
>>
>> Thanks for debugging the issue. Yes, it's caused by the HT workaround
>> for SNB/IVB/HSW.
>>
>> The weak group check in the kernel is in validate_group(). It only does
>> a sanity check. It doesn't check all the workarounds and the current
>> status of counters (e.g., whether the fixed counter is occupied by NMI
>> watchdog.) It's possible that a false positive is returned to the perf
>> tool. I once tried to fix the NMI watchdog check in the kernel, but the
>> proposal was rejected. So the metric constraint is introduced.
>>
>> For this issue, I think the above option2 should be a better and
>> practical choice. The issue is only observed on old machines, which
>> usually has a stable kernel running on it. I don't think the user wants
>> to update their kernel just to workaround an issue for several metrics.
>> But it should be much easier for them to update the perf tool.
>>
>> We know that the below events are the problematic events.
>> /* MEM_UOPS_RETIRED.* */
>> /* MEM_LOAD_UOPS_RETIRED.* */
>> /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
>> /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
>> Can we update the convertor script and apply the "--metric-no-group"
>> flag or add a new constraint if the above events are detected in
>> SNB/IVB/HSW?
>>
>> Thanks,
>> Kan
> 
> Thanks Kan,
> 
> We absolutely can do that! In this case should it be --metric-no-group
> only when SMT is enabled? I can do some patches but would like to know
> about whether we need SMT and not SMT versions of --metric-no-group.

The kernel workaround is disabled when SMT is off. So I think we only
need SMT version of --metric-no-group.
https://lore.kernel.org/all/1416251225-17721-13-git-send-email-eranian@google.com/T/#u

> Also, should we just have a list of metrics that need the flag or try
> to automate detection? 

I don't think Intel will update the metrics or events for the old
SNB/IVB/HSW platforms. Hard code a list of metrics may be simpler than
automated detection.

> Some warts in detection are the names of the
> events that vary between Ivybridge and Sandybridge, and how to
> determine which events conflict. For example, the perfmon event data:
> 
> MEM_LOAD_UOPS_RETIRED.LLC_HIT
> https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5368
> MEM_LOAD_UOPS_RETIRED.LLC_MISS
> https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5431
> CYCLE_ACTIVITY.STALLS_L2_PENDING
> https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L3541
>

The problematic events should have the same name among platforms. If the
event name doesn't work, the event encoding is exactly the same among
those platforms.


> The events list all counters, there are no errata fields.. Should the
> event data be updated and then in the converter script handle that? If
> I get shown an example I can modify the script accordingly.

If it can helps the converter script, I think we can update the errata
field.

Here are the errata information.
 * SNB: BJ122
 * IVB: BV98
 * HSW: HSD29

Here is the details regarding the issue. (Please search BV98)
https://www.intel.com/content/www/us/en/content-details/619604/desktop-3rd-generation-intel-core-processor-family-specification-update.html
> 
> It is also hard for me to test anything other than SMT on Ivybridge.
> 

I think it's OK to only test on Ivybridge.
The original kernel patch indicates the issue is the same among SNB, IVB
and HSW.
https://lore.kernel.org/all/1416251225-17721-7-git-send-email-eranian@google.com/T/#u

Thanks,
Kan