lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Mar 2022 14:31:21 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>,
        x86@...nel.org, linux-edac@...r.kernel.org,
        linux-kernel@...r.kernel.org, "H . Peter Anvin" <hpa@...or.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Yazen Ghannam <yazen.ghannam@....com>
Subject: Re: [PATCH 2/2] x86/mce: Add per-bank CMCI storm mitigation

On Thu, Feb 17, 2022 at 09:36:50AM -0800, Luck, Tony wrote:
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 4f9abb66520d..1f3e7c074182 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -714,6 +714,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  		barrier();
>  		m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
>  
> +		mce_intel_storm_tracker(i, m.status);

Why is this called before the VALID bit check?

Because you want to still run the tracker on each polling - not only
when it sees a valid error?

> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
> index cee9d989f791..2ed5634ec277 100644
> --- a/arch/x86/kernel/cpu/mce/intel.c
> +++ b/arch/x86/kernel/cpu/mce/intel.c
> @@ -47,8 +47,48 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
>   */
>  static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
>  
> +/*
> + * CMCI storm tracking state
> + */

Those could use some comments explaining what is tracking what:

> +static DEFINE_PER_CPU(int, stormy_bank_count);
> +static DEFINE_PER_CPU(u64 [MAX_NR_BANKS], bank_history);
> +static DEFINE_PER_CPU(bool [MAX_NR_BANKS], bank_storm);

AFAICT, this says whether a bank is in storm mode?

> +static DEFINE_PER_CPU(unsigned long [MAX_NR_BANKS], bank_time_stamp);

This looks like it collects the jiffies when the bank was looked at in
the storm tracker.

> +static int cmci_threshold[MAX_NR_BANKS];
> +
>  #define CMCI_THRESHOLD		1
>  
> +/*
> + * High threshold to limit CMCI rate during storms. Max supported is
> + * 0x7FFF. Use this slightly smaller value so it has a distinctive
> + * signature when some asks "Why am I not seeing all corrected errors?"
> + */
> +#define CMCI_STORM_THRESHOLD	0x7FED

Why is a "threshold" in hex?

> +
> +/*
> + * How many errors within the history buffer mark the start of a storm
> + */
> +#define STORM_BEGIN	5

That looks like a STORM_BEGIN_THRESHOLD to me.

> +
> +/*
> + * How many polls of machine check bank without an error before declaring
> + * the storm is over
> + */
> +#define STORM_END	30

Similarly:

STORM_END_POLL_THRESHOLD

> +
> +/*
> + * If there is no poll data for a bank for this amount of time, just
> + * discard the history.
> + */
> +#define STORM_INTERVAL (1 * HZ)

That looks unused.

> +static void cmci_storm_begin(int bank)
> +{
> +	__set_bit(bank, this_cpu_ptr(mce_poll_banks));
> +	this_cpu_write(bank_storm[bank], true);
> +	if (this_cpu_inc_return(stormy_bank_count) == 1)

s/ == 1//

> +		mce_timer_kick(true);
> +}
> +
> +static void cmci_storm_end(int bank)
> +{
> +	__clear_bit(bank, this_cpu_ptr(mce_poll_banks));
> +	this_cpu_write(bank_history[bank], 0ull);
> +	this_cpu_write(bank_storm[bank], false);
> +	if (this_cpu_dec_return(stormy_bank_count) == 0)

	if (!...

> +		mce_timer_kick(false);
> +}
> +
> +void mce_intel_storm_tracker(int bank, u64 status)

Function name needs a verb.

> +{
> +	unsigned long now = jiffies, delta;
> +	unsigned int shift;
> +	u64 history;
> +
> +	delta = now - this_cpu_read(bank_time_stamp[bank]);
> +	shift = this_cpu_read(bank_storm[bank]) ? 1 : (delta + HZBITS) / HZBITS;

Do

	shift = 1;

on function entry to simplify this assignment.

Also, I'm having trouble with this shift calculation. The laptop here has
HZ=250. Let's say delta is 2000 jiffies.

So if this bank wasn't in storm mode, I'd have

shift = (2000 + (250 / 64)) / (250 / 64) = 513

...

Aaaha, so only when the diff is < 250 in my case, i.e., it polls the
same bank within a second, only then it would shift the history. I.e.,
what you mean with that "The 64 bit width corresponds to about one
second."

> +	history = (shift < 64) ? this_cpu_read(bank_history[bank]) << shift : 0;
> +	this_cpu_write(bank_time_stamp[bank], now);
> +
> +	if ((status & (MCI_STATUS_VAL | MCI_STATUS_UC)) == MCI_STATUS_VAL)
> +		history |= 1;
> +	this_cpu_write(bank_history[bank], history);
> +
> +	if (this_cpu_read(bank_storm[bank])) {
> +		if (history & GENMASK_ULL(STORM_END - 1, 0))
> +			return;

Aha, under STORM_END polls you don't declare the storm as being over.

> +		pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank);
> +		cmci_set_threshold(bank, cmci_threshold[bank]);
> +		cmci_storm_end(bank);
> +	} else {
> +		if (hweight64(history) < STORM_BEGIN)
> +			return;

Aha, so you need STORM_BEGIN errors within the last second to cause the
storm polling. Ok I guess.

So all in all I can't find anything eeewy in this - it would need to
have a lot more documentation, though, as this is not the most trivial
thing to stare at.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ