linux-kernel - Re: [PATCH v9 2/3] x86/mce: Add per-bank CMCI storm mitigation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231114192324.GAZVPJLGZmfJBS181/@fat_crate.local>
Date:   Tue, 14 Nov 2023 20:23:24 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     Tony Luck <tony.luck@...el.com>
Cc:     Yazen Ghannam <yazen.ghannam@....com>,
        Smita.KoralahalliChannabasappa@....com,
        dave.hansen@...ux.intel.com, x86@...nel.org,
        linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
        patches@...ts.linux.dev
Subject: Re: [PATCH v9 2/3] x86/mce: Add per-bank CMCI storm mitigation

On Mon, Oct 23, 2023 at 11:14:04AM -0700, Tony Luck wrote:
> I want to track whether each bank is in storm mode, or not. But there's
> no indication when a CMCI is delivered which bank was the source. Code
> just has to scan all the banks, and might find more than one with an
> error. While no bank is in polling mode, there isn't a set time interval
> between scanning banks.

Well, if no bank is in *storm* polling mode - I presume you mean that
when you say "polling mode" - then we have the default polling interval
of machine_check_poll() of INITIAL_CHECK_INTERVAL, i.e., 5 mins, I'd
say.

> A scan is just triggered when a CMCI happens. So it's non-trivial to
> compute a rate. Would require saving a timestamp for every logged
> error.

So what I'm trying to establish first is, what our entry vectors into
the storm code are.

1. You can enter storm tracking when you poll normally. I.e.,
   machine_check_poll() each 5 mins once.

   mce_track_storm() tracks history only for MCI_STATUS_VAL=1b CEs, which
   is what I was wondering. It is hidden a bit down in that function.

2. As a result of a CMCI interrupt. That will call machine_check_poll()
   too and go down the same path.

So that flow should be called out in the commit message so that the big
picture is clear - how we're doing that storm tracking.

Now, what is protecting this against concurrent runs of
machine_check_poll()? Imagine the timer fires and a CMCI happens at the
same time and on the same core.

> In a simple case there's just one bank responsible for a ton of CMCI.
> No need for complexity here, the count of interrupts from that bank will
> hit a threshold and a storm is declared.

Right.

> But more complex scenarois are possible. Other banks may trigger small
> numbers of CMCI. Not enough to call it a storm.  Or multiple banks may
> be screaming together.
> 
> By tracking both the hits and misses in each bank, I end up with a
> bitmap history for the past 64 polls. If there are enough "1" bits in
> that bitmap to meet the threshold, then declare a storm for that
> bank.

Yap, I only want to be crystal clear on the flow and the entry points.

> I need to stare at this again to refresh my memory of what's going on
> here. This code may need pulling apart into a routine that is used for
> systems with no CMCI (or have CMCI disabled). Then the whole "divide the
> poll interval by two" when you see an error and double the interval
> when you don't see an error makes sense.
> 
> For systems with CMCI ... I think just polling a one second interval
> until the storm is over makes sense.

Ok.

> These are only used in threshold.c now. What's the point of them
> being in internal.h. That's for defintiones shared by multiple
> mcs/*.c files. Isn't it? But will move there if you still want this.

Structs hidden in .c files looks weird but ok.

> Ideally the new CPU would inherit the precise state of the previous
> owner of this bank. But there's no easy way to track that as the bank
> is abanoned by the CPU going offline, and there is a free-for-all with
> remaining CPUs racing to claim ownership. It is known that this bank
> was in storm mode (because the threshold in the CTL2 bank register is
> set to CMCI_STORM_THRESHOLD).
> 
> I went with "worst case" to make sure the new CPU didn't prematurely
> declare an end to the storm.
> 
> I'll add a comment in mce_inherit_storm() to explain this.

Yap, exactly.

> Like this?
> 
> #define NUM_HISTORY_BITS (sizeof(u64) * BITS_PER_BYTE)
> 
> 	if (shift < NUM_HISTORY_BITS)

Yap.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette