lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250915173332.GA869676@yaz-khff2.amd.com>
Date: Mon, 15 Sep 2025 13:33:32 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Nikolay Borisov <nik.borisov@...e.com>
Cc: x86@...nel.org, Tony Luck <tony.luck@...el.com>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
	Smita.KoralahalliChannabasappa@....com,
	Qiuxu Zhuo <qiuxu.zhuo@...el.com>, linux-acpi@...r.kernel.org
Subject: Re: [PATCH v6 15/15] x86/mce: Save and use APEI corrected threshold
 limit

On Thu, Sep 11, 2025 at 08:01:17PM +0300, Nikolay Borisov wrote:
> 
> 
> On 9/8/25 18:40, Yazen Ghannam wrote:
> > The MCA threshold limit generally is not something that needs to change
> > during runtime. It is common for a system administrator to decide on a
> > policy for their managed systems.
> > 
> > If MCA thresholding is OS-managed, then the threshold limit must be set
> > at every boot. However, many systems allow the user to set a value in
> > their BIOS. And this is reported through an APEI HEST entry even if
> > thresholding is not in FW-First mode.
> > 
> > Use this value, if available, to set the OS-managed threshold limit.
> > Users can still override it through sysfs if desired for testing or
> > debug.
> > 
> > APEI is parsed after MCE is initialized. So reset the thresholding
> > blocks later to pick up the threshold limit.
> > 
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> > ---
> > 
> > Notes:
> >      Link:
> >      https://lore.kernel.org/r/20250825-wip-mca-updates-v5-20-865768a2eef8@amd.com
> >      v5->v6:
> >      * No change.
> >      v4->v5:
> >      * No change.
> >      v3->v4:
> >      * New in v4.
> > 
> >   arch/x86/include/asm/mce.h          |  6 ++++++
> >   arch/x86/kernel/acpi/apei.c         |  2 ++
> >   arch/x86/kernel/cpu/mce/amd.c       | 18 ++++++++++++++++--
> >   arch/x86/kernel/cpu/mce/internal.h  |  2 ++
> >   arch/x86/kernel/cpu/mce/threshold.c | 13 +++++++++++++
> >   5 files changed, 39 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> > index 7d6588195d56..1cfbfff0be3f 100644
> > --- a/arch/x86/include/asm/mce.h
> > +++ b/arch/x86/include/asm/mce.h
> > @@ -308,6 +308,12 @@ DECLARE_PER_CPU(struct mce, injectm);
> >   /* Disable CMCI/polling for MCA bank claimed by firmware */
> >   extern void mce_disable_bank(int bank);
> > +#ifdef CONFIG_X86_MCE_THRESHOLD
> > +void mce_save_apei_thr_limit(u32 thr_limit);
> > +#else
> > +static inline void mce_save_apei_thr_limit(u32 thr_limit) { }
> > +#endif /* CONFIG_X86_MCE_THRESHOLD */
> > +
> >   /*
> >    * Exception handler
> >    */
> > diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
> > index 0916f00a992e..e21419e686eb 100644
> > --- a/arch/x86/kernel/acpi/apei.c
> > +++ b/arch/x86/kernel/acpi/apei.c
> > @@ -19,6 +19,8 @@ int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data)
> >   	if (!cmc->enabled)
> >   		return 0;
> > +	mce_save_apei_thr_limit(cmc->notify.error_threshold_value);
> > +
> >   	/*
> >   	 * We expect HEST to provide a list of MC banks that report errors
> >   	 * in firmware first mode. Otherwise, return non-zero value to
> > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> > index b895559e80ad..9b746080351f 100644
> > --- a/arch/x86/kernel/cpu/mce/amd.c
> > +++ b/arch/x86/kernel/cpu/mce/amd.c
> > @@ -489,6 +489,18 @@ static void threshold_restart_bank(unsigned int bank, bool intr_en)
> >   	}
> >   }
> > +/* Try to use the threshold limit reported through APEI. */
> > +static u16 get_thr_limit(void)
> > +{
> > +	u32 thr_limit = mce_get_apei_thr_limit();
> > +
> > +	/* Fallback to old default if APEI limit is not available. */
> > +	if (!thr_limit)
> > +		return THRESHOLD_MAX;
> > +
> > +	return min(thr_limit, THRESHOLD_MAX);
> > +}
> > +
> >   static void mce_threshold_block_init(struct threshold_block *b, int offset)
> >   {
> >   	struct thresh_restart tr = {
> > @@ -497,7 +509,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
> >   		.lvt_off		= offset,
> >   	};
> > -	b->threshold_limit		= THRESHOLD_MAX;
> > +	b->threshold_limit		= get_thr_limit();
> >   	threshold_restart_block(&tr);
> >   };
> > @@ -1071,7 +1083,7 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
> >   	b->address		= address;
> >   	b->interrupt_enable	= 0;
> >   	b->interrupt_capable	= lvt_interrupt_supported(bank, high);
> > -	b->threshold_limit	= THRESHOLD_MAX;
> > +	b->threshold_limit	= get_thr_limit();
> >   	if (b->interrupt_capable) {
> >   		default_attrs[2] = &interrupt_enable.attr;
> > @@ -1082,6 +1094,8 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
> >   	list_add(&b->miscj, &tb->miscj);
> > +	mce_threshold_block_init(b, (high & MASK_LVTOFF_HI) >> 20);
> 
> Why is this necessary? Shouldn't this patch consist of mainly
> s/THRESHOLD_MAX/get_thr_limit();
> 
> 
> In allocate_threshold_block have already properly set threshold_limit. So
> this change really ensures threshold_restart_block is being called for the
> given block being initialized. Ignoring the changed threshold limit logic,
> why is this extra call necessary now and wasn't before?
> 

It is necessary to apply the threshold limit to the hardware register.
The MCA thresholding registers are accessed in two passes: first during
per-CPU init, and second when the MCE subsystem devices are created.

The hardware registers are updated in the first pass, and they are left
as-is in the second pass assuming no configuration has changed. That's
why there isn't a "reset" in the second pass.

The APEI tables are parsed between the first and second passes. So now
we need to update the registers during the second pass to apply the
value found from HEST.

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ