[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a1eff492b192bbe68716b46c18cd7152951c6550.camel@rong.moe>
Date: Tue, 30 Sep 2025 18:18:30 +0800
From: Rong Zhang <i@...g.moe>
To: "Mario Limonciello (AMD) (kernel.org)" <superm1@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, "H. Peter Anvin"
<hpa@...or.com>, Yazen Ghannam <yazen.ghannam@....com>,
linux-kernel@...r.kernel.org, Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH] x86/CPU/AMD: Prevent reset reasons from being retained
among boots
Hi Mario,
On Sun, 2025-09-28 at 23:01 -0500, Mario Limonciello (AMD) (kernel.org)
wrote:
>
> On 9/16/2025 9:02 AM, Borislav Petkov wrote:
> > On Sat, Sep 13, 2025 at 10:42:45PM +0800, Rong Zhang wrote:
> > > The S5_RESET_STATUS register is parsed on boot and printed to kmsg.
> > > However, this could sometimes be misleading and lead to users wasting a
> > > lot of time on meaningless debugging for two reasons:
> > >
> > > * Some bits are never cleared by hardware. It's the software's
> > > responsibility to clear them as per the Processor Programming Reference
> > > (see Link:).
> > >
> > > * Some rare hardware-initiated platform resets do not update the
> > > register at all.
> > >
> > > In both cases, a previous reboot could leave its trace in the register,
> > > resulting in users seeing unrelated reboot reasons while debugging
> > > random reboots afterward.
> >
> > Just a heads-up: we're figuring out internally what the right thing to do here
> > would be.
> >
> > Stay tuned.
> >
> > Thx.
> >
>
> The internal conversation points in the direction of your patch makes sense.
Thanks for your effort!
> But I don't really see a lot of value in re-reading and printing a debug
> message about what was cleared and what's still there. Do you see a
> reason to keep that around?
In order that users don't need an up-to-date documentation, or even
don't need a documentation, to distinguish between reason bits and non-
reason ones.
Let's consider two examples.
(a)
Previous system reset reason [0x08000800]: an uncorrected error...
^ ^
A user may feel confused: Two bits are set, but only one reason is
reported. Hmm... Is there a hidden failure?
Unless the user has read the PPR, it's hard to realize BIT(11) is
already set in the reset value. The debug message is here to help:
Cleared system reset reasons [0x08000800 => 0x00000800]
^ ^ ^ ^
Now the user realizes that BIT(11) has nothing to do with reboot
reasons.
This was literally the confusion I experienced. I had to take some time
looking for an appropriate public PPR and reading the PPR before
realizing this fact.
(b)
Suppose BIT(7) is defined in the future.
(nothing is printed to kmsg)
The debug message is here to help:
Cleared system reset reasons [0x00000880 => 0x00000800]
^^ ^^
Now the user realizes that BIT(7) represents a new reboot reason.
Thanks,
Rong
Powered by blists - more mailing lists