[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250902133712.GB18483@yaz-khff2.amd.com>
Date: Tue, 2 Sep 2025 09:37:13 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: x86@...nel.org, Tony Luck <tony.luck@...el.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
Smita.KoralahalliChannabasappa@....com,
Qiuxu Zhuo <qiuxu.zhuo@...el.com>,
Nikolay Borisov <nik.borisov@...e.com>, linux-acpi@...r.kernel.org
Subject: Re: [PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling
On Tue, Sep 02, 2025 at 01:10:52PM +0200, Borislav Petkov wrote:
> On Mon, Aug 25, 2025 at 05:33:10PM +0000, Yazen Ghannam wrote:
> > +/*
> > + * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
> > + * goes off when error_count reaches threshold_limit.
> > + */
> > +static void amd_threshold_interrupt(void)
> > +{
> > + machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
> > }
>
> So the thresholding interrupt will fire.
>
> It'll call machine_check_poll().
>
> That thing will do something and eventually call back into amd.c again:
>
> if (mce_flags.amd_threshold)
> amd_reset_thr_limit(i);
This resets only a bank with a valid error.
Also, it resets the limit *before* clearing MCi_STATUS which should be
the last step.
>
> Why the back'n'forth?
>
> Why not:
>
> static void amd_threshold_interrupt(void)
> {
> machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
> amd_reset_thr_limit();
This means we'd need to do another loop through the banks. Their
MCi_STATUS registers would be cleared. So they could log another error
before the limit is reset.
Overall, the goal is to loop through the banks one time and log/reset
banks as we go through them.
Thanks,
Yazen
Powered by blists - more mailing lists