[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38243641-6aea-4d85-aeea-310c849f1043@suse.com>
Date: Wed, 25 Jun 2025 16:22:56 +0300
From: Nikolay Borisov <nik.borisov@...e.com>
To: Yazen Ghannam <yazen.ghannam@....com>, x86@...nel.org,
Tony Luck <tony.luck@...el.com>, "Rafael J. Wysocki" <rafael@...nel.org>,
Len Brown <lenb@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
Smita.KoralahalliChannabasappa@....com, Qiuxu Zhuo <qiuxu.zhuo@...el.com>,
linux-acpi@...r.kernel.org
Subject: Re: [PATCH v4 02/22] x86/mce: Restore poll settings after storm
subsides
On 6/24/25 17:15, Yazen Ghannam wrote:
> Users can disable MCA polling by setting the "ignore_ce" parameter or by
> setting "check_interval=0". This tells the kernel to *not* start the MCE
> timer on a CPU.
>
> If the user did not disable CMCI, then storms can occur. When these
> happen, the MCE timer will be started with a fixed interval. After the
> storm subsides, the timer's next interval is set to check_interval.
I think the subject of the patch doesn't do justice to the patch
content. In fact, what this change does is ensure the timer function
honors CE handling being disabled either via ignore_ce or check_interval
being 0 when a CMCI storm subsides. So a subject along the lines of:
"Ensure user settings are considered when CMCI storm subsides" or
something like that is more descriptive of what you are doing.
At the very least you are not restoring anything, because even without
this patch when the storm subsided you'd start the timer with a value of
'iv'.
>
> This disregards the user's input through "ignore_ce" and
> "check_interval". Furthermore, if "check_interval=0", then the new timer
> will run faster than expected.
>
> Create a new helper to check these conditions and use it when a CMCI
> storm ends.
>
> Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation")
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> Cc: stable@...r.kernel.org
> ---
>
> Notes:
> Link:
> https://lore.kernel.org/r/20250415-wip-mca-updates-v3-17-8ffd9eb4aa56@amd.com
>
> v3->v4:
> * Update commit message.
> * Move to beginning of set.
> * Note: Polling vs thresholding use case updates not yet addressed.
>
> v2->v3:
> * New in v3.
>
> arch/x86/kernel/cpu/mce/core.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 07d61937427f..ae2e2d8ec99b 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1740,6 +1740,11 @@ static void mc_poll_banks_default(void)
>
> void (*mc_poll_banks)(void) = mc_poll_banks_default;
>
> +static bool should_enable_timer(unsigned long iv)
> +{
> + return !mca_cfg.ignore_ce && iv;
> +}
> +
> static void mce_timer_fn(struct timer_list *t)
> {
> struct timer_list *cpu_t = this_cpu_ptr(&mce_timer);
> @@ -1763,7 +1768,7 @@ static void mce_timer_fn(struct timer_list *t)
>
> if (mce_get_storm_mode()) {
> __start_timer(t, HZ);
> - } else {
> + } else if (should_enable_timer(iv)) {
> __this_cpu_write(mce_next_interval, iv);
> __start_timer(t, iv);
> }
> @@ -2156,7 +2161,7 @@ static void mce_start_timer(struct timer_list *t)
> {
> unsigned long iv = check_interval * HZ;
>
> - if (mca_cfg.ignore_ce || !iv)
> + if (!should_enable_timer(iv))
> return;
>
> this_cpu_write(mce_next_interval, iv);
>
Powered by blists - more mailing lists