linux-kernel - Re: [PATCH] x86/CPU/AMD: Ignore invalid reset reason value

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aKM3moQgdlqr6qIy@google.com>
Date: Mon, 18 Aug 2025 07:24:26 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Mario Limonciello <mario.limonciello@....com>, Yazen Ghannam <yazen.ghannam@....com>, x86@...nel.org, 
	linux-kernel@...r.kernel.org, Libing He <libhe@...hat.com>, 
	David Arcari <darcari@...hat.com>
Subject: Re: [PATCH] x86/CPU/AMD: Ignore invalid reset reason value

On Sat, Aug 16, 2025, Borislav Petkov wrote:
> On Fri, Aug 15, 2025 at 03:49:00PM -0700, Sean Christopherson wrote:
> > Which is what I'm suggesting here.  If an MMIO load reads back -1u, then
> > it's a darn good signal that FCH_PM_S5_RESET_STATUS is either unsupported or
> > malfunctioning.  I don't see any reason to drag HYPERVISOR into the mess.
> 
> Ok, let's play through how you suggest it:
> 
> We do not check HYPERVISOR.
> 
> The machine boots on *some* guest and it says:
> 
> "x86/amd: Previous system reset reason [0x1]: "power button was pressed for 4 seconds".
> 
> If this were:
> 
> * a normal KVM guest: the machine is straight lying here. There's no power
> button.
> 
> * "to improve system stability/uptime, e.g. restart the VM if the kernel
> crashes as opposed to rebooting the entire host" - this is basically telling
> the guest owner that the *host* got rebooted. I'm not sure you want to tell
> that to guest owners if you're a cloud vendor.

Most definitely not if the guest owner and host owner are not one and the same.
The example use case is where the platform owner is running one of _their_ kernels
in a VM, in which case that kernel probably does want to know why the platform
reboot.

> * If this guest type is lying: same issue as above.
> 
> So, what guarantees that hypervisors will adhere to the spec and DTRT here?

The same thing that guarantees hardware vendors adhere to specs: the desire to
get paid.

> Hell, hypervisors don't even care about that probably because they don't know
> yet that this thing exists or if they do, they *definitely* want to return an
> error here and not disclose to guest owners why they got rebooted.
> 
> So the only thing that is workable here is if all hypervisors either return an
> error value we do handle or they implement this properly.

And QEMU did return an error value, 0xffffffff, a.k.a. PCI Master Abort / PCIe
Unsupported Request.  I would be amazed if any real world, general purpose VMM
did anything else for an MMIO access to an unknown/unsupported range.

> And because I don't trust hypervisors to do this right because, as I said,
> there needs to be a concentrated effort to support it - otherwise no one
> cares, the *least* we can do here is:
> 
>                 if (s5_reset_reason_txt[i]) {
>                         pr_info("x86/amd: Previous system reset reason [0x%08lx]: %s%s\n",
>                                 value, s5_reset_reason_txt[i], 
>                                 (cpu_feature_enabled(X86_FEATURE_HYPERVISOR) ? " - unreliable: running on a HV" : ""));
>                 }

Huh?  Handle a read of all 0xffs as proposed in this patch, and this is unnecessary.