[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250911161142.GB108087@yaz-khff2.amd.com>
Date: Thu, 11 Sep 2025 12:11:42 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Nikolay Borisov <nik.borisov@...e.com>
Cc: x86@...nel.org, Tony Luck <tony.luck@...el.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
Smita.KoralahalliChannabasappa@....com,
Qiuxu Zhuo <qiuxu.zhuo@...el.com>, linux-acpi@...r.kernel.org
Subject: Re: [PATCH v6 12/15] x86/mce/amd: Remove redundant reset_block()
On Thu, Sep 11, 2025 at 05:42:39PM +0300, Nikolay Borisov wrote:
>
>
> On 8.09.25 г. 18:40 ч., Yazen Ghannam wrote:
> > Many of the checks in reset_block() are done again in the block reset
> > function. So drop the redundant checks.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
>
>
> > ---
> >
> > Notes:
> > Link:
> > https://lore.kernel.org/r/20250825-wip-mca-updates-v5-17-865768a2eef8@amd.com
> > v5->v6:
> > * No change.
> > v4->v5:
> > * No change.
> > v3->v4:
> > * New in v4.
> >
> > arch/x86/kernel/cpu/mce/amd.c | 28 +++++++---------------------
> > 1 file changed, 7 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> > index 34268940c88a..9ca4079ff342 100644
> > --- a/arch/x86/kernel/cpu/mce/amd.c
> > +++ b/arch/x86/kernel/cpu/mce/amd.c
> > @@ -812,29 +812,11 @@ static void amd_deferred_error_interrupt(void)
> > machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->dfr_intr_banks);
> > }
> > -static void reset_block(struct threshold_block *block)
> > -{
> > - struct thresh_restart tr;
> > - u32 low = 0, high = 0;
> > -
> > - if (!block)
> > - return;
> > -
> > - if (rdmsr_safe(block->address, &low, &high))
> > - return;
>
>
> This is being replaced by rdmsr, I guess it's safe because the fact we are
> processing a block which has been on the bank list means it's unlikely the
> rdmsr will fault.
>
Yes, and the MCA register space is predefined even if not all registers
are occupied/implemented/backed by hardware.
The behavior on AMD is that an unused MSR will be Read-as-Zero (RAZ)
rather than cause a #GP.
>
> > -
> > - if (!(high & MASK_OVERFLOW_HI))
> > - return;
>
> nit: However, now, if mask overflow is not set a write to the msr will be
> performed, with the effect that IntType is going to be cleared (hi &=
> ~MASK_INT_TYPE_HI; in threshold_restart_block), and MASK_COUNT_EN_HI will be
> set, that's different than the existing code, albeit it might be ok.
Yes, correct. We may have extra unnecessary writes. This would only
happen if we find a valid error to begin with. And the error is a
deferred error or a corrected error found during polling.
In effect, the register value won't be changed. The control bits will be
set the same way as during initialization.
We could add another flag and try to affect the code flow. But I don't
know that the overhead is worth it.
A lot of this set is trying to do away with extra overhead, duplicate
code, etc.
Thanks,
Yazen
Powered by blists - more mailing lists