linux-kernel - Re: [PATCH v2 09/16] x86/mce: Unify AMD THR handler with MCA Polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <8b396843-4505-415e-b989-14bb37245877@amd.com>
Date: Tue, 7 May 2024 12:25:07 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: yazen.ghannam@....com, linux-edac@...r.kernel.org,
 linux-kernel@...r.kernel.org, tony.luck@...el.com, x86@...nel.org,
 Avadhut.Naik@....com, John.Allen@....com
Subject: Re: [PATCH v2 09/16] x86/mce: Unify AMD THR handler with MCA Polling

On 5/4/24 10:52 AM, Borislav Petkov wrote:
> On Mon, Apr 29, 2024 at 10:36:57AM -0400, Yazen Ghannam wrote:
>> Related to this, I've been thinking that banks with thresholding enabled
>> should be removed from the list of polling banks. This is done on Intel but
>> not on AMD.
>>
>> I wanted to give it more thought, because I think folks have come to expect
>> polling and thresholding to be independent on AMD.
> 
> Yes, this whole thing sounds weird.
> 
> On the one hand, you have a special interrupt for errors which have
> reached a threshold *just* *so* you don't have to poll. Because polling
> is ok but getting a a special interrupt is better and such notification
> systems always want to have a special interrupt and not have to poll.
> 
> On the other hand, you're marrying the two which sounds weird. Why?
> 
> What is wrong with getting thresholding interrupts?
> 

Nothing. This patch is not disabling the interrupt. The goal is to get
rid of duplicate code and have a single function that checks the MCA
banks.

This would be similar to intel_threshold_interrupt().

> Why can't we simply stop the polling and do THR only if available? That
> would save a lot of energy.
> 
> So why can't we program the THR to raise an interrupt on a single error
> and disable polling completely?
> 
> Because that would be a lot better as the hardware would be doing the
> work for us.
> 
> In any case, I'm missing the strategy here so no cleanups without
> a clear goal first please.
>

We could do that. In fact, there's a request to use the threshold that
is pre-programmed in the hardware. And we could use some of the current
kernel parameters for overrides, if needed.

Thanks,
Yazen