[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918211405.GA2180898@yaz-khff2.amd.com>
Date: Thu, 18 Sep 2025 17:14:05 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Nikolay Borisov <nik.borisov@...e.com>,
Bert Karwatzki <spasswolf@....de>, Borislav Petkov <bp@...en8.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-next@...r.kernel.org" <linux-next@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"rafael@...nel.org" <rafael@...nel.org>,
"Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>,
"Smita.KoralahalliChannabasappa@....com" <Smita.KoralahalliChannabasappa@....com>
Subject: Re: spurious mce Hardware Error messages in next-20250912
On Thu, Sep 18, 2025 at 09:04:53PM +0000, Luck, Tony wrote:
> > For the current issue, it does seem that the registers contain junk
> > values. And we are only now seeing this with the recent rework.
>
> Do you try to clear these registers after logging? Or just rely on clearing
> the MCi_STATUS register?
>
Yes, the MCA_DESTAT register is cleared in a couple of places depending
on the scenario.
> If you are clearing, then it isn't working (or new junk values appear quickly).
Right, and MCi_STATUS has junk values in some of the affected banks.
They just happen to be ignored because they don't have the Valid bit
set.
And MCi_STATUS is cleared at Linux init time, so the junk values stick
or come back by the time Bert ran the script.
Thanks,
Yazen
Powered by blists - more mailing lists