linux-kernel - Re: [PATCH] perf/x86/intel: Restrict period on Haswell

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <059d6217-10a5-4d2a-b639-90806c04a13b@linux.intel.com>
Date: Thu, 15 Aug 2024 16:15:59 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>, Li Huafei <lihuafei1@...wei.com>,
 peterz@...radead.org, mingo@...hat.com
Cc: acme@...nel.org, namhyung@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
 adrian.hunter@...el.com, bp@...en8.de, dave.hansen@...ux.intel.com,
 x86@...nel.org, hpa@...or.com, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
 Vince Weaver <vincent.weaver@...ne.edu>
Subject: Re: [PATCH] perf/x86/intel: Restrict period on Haswell



On 2024-08-15 2:26 p.m., Thomas Gleixner wrote:
> Kan!
> 
> On Thu, Aug 15 2024 at 11:39, Kan Liang wrote:
>> On 2024-08-14 6:47 p.m., Thomas Gleixner wrote:
>>> Now the conclusion of this fun exercise is:
>>>
>>>     1) The hardware behaves differently when the perf event happens
>>>        concurrently on HT siblings
>>
>> I think I found a related erratum.
> 
>> HSM154. Fixed-Function Performance Counter May Over Count Instructions
>> Retired by 32 When Intel® Hyper-Threading Technology is Enabled
>>
>> Problem: If, while Intel Hyper-Threading Technology is enabled, the
>> IA32_FIXED_CTR0 MSR
>> (309H) is enabled by setting bits 0 and/or 1 in the
>> IA32_PERF_FIXED_CTR_CTRL MSR
>> (38DH) before setting bit 32 in the IA32_PERF_GLOBAL_CTRL MSR (38FH) then
>> IA32_FIXED_CTR0 may over count by up to 32.
>>
>> Implication: When this erratum occurs, the fixed-function performance
>> counter IA32_FIXED_CTR0 may over count by up to 32.
> 
> Sure. That's only explaining half of the problem.
> 
> As I demonstrated in the non-contended case even with a count of 2 (I
> tried 1 too) the status bit is never set on the second check.
> 

Do you mean the below example? The status bit (32) of the fixed counter
0 is always set.


65.147782: x86_perf_event_set_period: idx:    32 period:         1 left:  2
65.147783: intel_pmu_handle_irq:      loops: 001 status: 100000000

65.147784: x86_perf_event_set_period: idx:    32 period:         1 left:  2
65.147784: intel_pmu_handle_irq:      loops: 002 status: 100000000

> Which is weird, because the number of instructions between setting the
> count and re-checking the status MSR is definitely larger than 2 (or 1).
> 
>> Workaround: The following sequence avoids this erratum (steps 1 and 2
>> are needed if the counter was previously enabled):
>> 1. Clear bit 32 in the IA32_PERF_GLOBAL_CTRL MSR (38FH) and clear bits 1
>> and 0 in the IA32_PERF_FIXED_CTR_CTRL MSR (38DH).
>> 2. Zero the IA32_FIXED_CTR0 MSR.
>> 3. Set bit 32 in the IA32_PERF_GLOBAL_CTRL MSR.
>> 4. Set bits 0 and/or 1 in the IA32_PERF_FIXED_CTR_CTRL MSR as desired.
>>
>> It should explains that the issue is gone with the magic number 32 or
>> disabling the Hyper-Threading.
> 
> It explains only half of it. If you use 32, then the counter is set to
> -32 so the overcount of 32 will still bring it to 0, which should set
> the status bit, no?

I think it's up to 32, not always 32.
I don't have more details regarding the issue. The architect of HSW has
left. I'm asking around internally to find the original bug report of
the erratum. Hope there are more details in the report.

> 
>> I also found a related discussion about 9 years ago.
>> https://lore.kernel.org/lkml/alpine.DEB.2.11.1505181343090.32481@vincent-weaver-1.umelst.maine.edu/
>> Vince tried the workaround but it seems not work.
> 
> Let me play with that. :)

Appreciate it.

> 
>>>     2) The frequency estimation algorithm is broken
>>
>> For the events which occurs frequently, e.g., instructions, cycles, yes,
>> the frequency estimation algorithm doesn't work well.
>>
>> But there are events that may not occur frequently. If a big init period
>> is set, it may be impossible to get the required freq for those events.
>>
>> It's really hard to pick a universal init period that works for all
>> events.
> 
> I understand that, but especially for RETIRED it's obvious :)
> 
>> I'm thinking perf may only calculate/pre-set a init period for the Linux
>> defined architectural events, e.g., instructions, cycles, branches,
>> cache related events, etc. For the other ARCH specific events, I'm
>> afraid the period has to start 1.
> 
> Yes, that would be way better than what we have now.

Great. I will post a patch to improve it.

> 
>>>
>>>     3) Using a 'limit' guestimate is just papering over the underlying
>>>        problems
>>
>> It's possible that a user set a small number with -c. If the number is
>> less than the 'limit', it needs to be adjusted to avoid HW failure.
>> I think the 'limit' is still required.
> 
> I'm not against the limit per se. I'm against guestimated limits which
> are thrown into the code without understanding the underlying problem.
> 
> The just paper over it up to the point where they bite back because the
> guestimate was off by $N.
> 

Got it.

Thanks,
Kan