lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 17 Mar 2023 14:50:00 +0000
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     Tony Luck <tony.luck@...el.com>
Cc:     Borislav Petkov <bp@...en8.de>,
        Smita.KoralahalliChannabasappa@....com,
        dave.hansen@...ux.intel.com, hpa@...or.com,
        linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
        x86@...nel.org, patches@...ts.linux.dev
Subject: Re: [PATCH v2 0/5] Handle corrected machine check interrupt storms

On Mon, Jun 27, 2022 at 10:36:00AM -0700, Tony Luck wrote:
> Extend the logic of handling Intel's corrected machine check interrupt
> storms to AMD's threshold interrupts.
> 
> First two patches are from Tony which cleans up the existing storm
> handling for Intel and proposes per CPU per bank storm handling.
> 
> Third and fourth patches do some cleanup and refactoring on the CMCI
> storm handling in order to extend similar workaround for AMD's threshold
> interrupt storms. These two patches could be merged into Tony's second
> patch of CMCI storm mitigation.
> 
> AMD's storm mitigation for threshold interrupts also relies on per CPU
> per bank approach similar to Intel. But unlike CMCI storm handling it does
> not set thresholds to reduce rate of interrupts on a storm. Rather it
> turns off the interrupt on the current CPU and bank if there is a storm
> and re-enables back the interrupts when the storm subsides.
> 
> It is okay to turn off threshold interrupts on AMD systems as other error
> severities continue to be handled even if the threshold interrupts are
> turned off. Uncorrected errors will generate a #MC and deferred errors
> have a unique separate deferred error interrupt. The final patch adds
> support for handling threshold interrupt storms on AMD systems.
> 
> Changes since v1:
> 
> 1) Fix shift computation when keeping track of bank history. Shift
> should be "1" when a storm is in progress (because polling once per
> second). When a storm is not in progress shift should be based on
> number of seconds since the bank was last checked.
> 
> 2) Changed Smita's code in part 0003 to avoid use of a function pointer
> (since the kernel is avoiding indirect branch points that might be
> trainable for various Spectre-like issues).
> 
> Smita Koralahalli (2):
>   x86/mce: Introduce mce_handle_storm() to deal with begin/end of storms
>   x86/mce: Handle AMD threshold interrupt storms
>   x86/mce: Move storm handling to core.
> 
> Tony Luck (3):
>   x86/mce: Remove old CMCI storm mitigation code
>   x86/mce: Add per-bank CMCI storm mitigation
> 
>  arch/x86/kernel/cpu/mce/amd.c      |  49 ++++++++
>  arch/x86/kernel/cpu/mce/core.c     | 139 +++++++++++++++++-----
>  arch/x86/kernel/cpu/mce/intel.c    | 179 +++++++----------------------
>  arch/x86/kernel/cpu/mce/internal.h |  33 ++++--
>  4 files changed, 230 insertions(+), 170 deletions(-)
> 
> --

Hi Tony,

Is there an updated version of this set? I can help review and test. Smita is
focusing on other items at the moment.

Thanks!

-Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ