lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <10587d02-1133-45fa-9ec8-2288a8868b68@intel.com>
Date: Thu, 21 Aug 2025 10:24:22 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Borislav Petkov <bp@...en8.de>
CC: Dave Hansen <dave.hansen@...ux.intel.com>, Tony Luck
	<tony.luck@...el.com>, <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<vannapurve@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
	<mingo@...hat.com>, <x86@...nel.org>, H Peter Anvin <hpa@...or.com>,
	<linux-edac@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<kvm@...r.kernel.org>, <rick.p.edgecombe@...el.com>, <kai.huang@...el.com>,
	<reinette.chatre@...el.com>, <xiaoyao.li@...el.com>,
	<tony.lindgren@...ux.intel.com>, <binbin.wu@...ux.intel.com>,
	<ira.weiny@...el.com>, <isaku.yamahata@...el.com>, Fan Du <fan.du@...el.com>,
	Yazen Ghannam <yazen.ghannam@....com>, <yan.y.zhao@...el.com>,
	<chao.gao@...el.com>
Subject: Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in
 recovery for errors in TDX/SEAM non-root mode

On 20/08/2025 00:32, Borislav Petkov wrote:
> On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
>> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
>> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
>> valid physical address bits within the machine check bank address register.
>>
>> This is particularly needed in the case of errors in TDX/SEAM non-root mode
>> because the reported address contains the TDX KeyID.  Refer to TDX and
>> TME-MK documentation for more information about KeyIDs.
>>
>> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
>> non-root mode") uses the address to mark the affected page as poisoned, but
>> omits to use the aforementioned mask.
>>
>> Investigation of user space expectations has concluded it would be more
>> correct for the address to contain only address bits in the first place.
>> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
>>
>> Mask the address when it is read from the machine check bank address
>> register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
>> later patch.
> 
> Why is this patch talking about TDX-something but doing "global" changes to
> mce.addr?

It falls a bit into the category of: easier to maintain a
global way of doing things than have lots of special-cases.

> 
> Why don't you simply do a TDX-specific masking out when you're running on
> in TDX env and leave the rest as is?
> 

It was kinda like that in V1:

	https://lore.kernel.org/r/20250618120806.113884-2-adrian.hunter@intel.com/

where the code change was dealing with SEAM_NR in the block starting:

	} else if (m->mcgstatus & MCG_STATUS_SEAM_NR) {

Then Dave asked about changing addr itself:

	https://lore.kernel.org/all/487c5e63-07d3-41ad-bfc0-bda14b3c435e@intel.com/
	https://lore.kernel.org/all/79eca29a-8ba4-4ad9-b2e0-54d8e668f731@intel.com/

And it seems like user space does expect addr to be a physical address:

	https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com

Something like below would work, but doesn't answer Dave's question
of why not do it in mce_read_aux()

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4da4eab56c81..53c7ea3d0464 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1655,28 +1655,30 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	} else if (m->mcgstatus & MCG_STATUS_SEAM_NR) {
 		/*
 		 * Saved RIP on stack makes it look like the machine check
 		 * was taken in the kernel on the instruction following
 		 * the entry to SEAM mode. But MCG_STATUS_SEAM_NR indicates
 		 * that the machine check was taken inside SEAM non-root
 		 * mode.  CPU core has already marked that guest as dead.
 		 * It is OK for the kernel to resume execution at the
 		 * apparent point of the machine check as the fault did
 		 * not occur there. Mark the page as poisoned so it won't
 		 * be added to free list when the guest is terminated.
 		 */
 		if (mce_usable_address(m)) {
-			struct page *p = pfn_to_online_page(m->addr >> PAGE_SHIFT);
+			struct page *p;
 
+			m->addr &= MCI_ADDR_PHYSADDR;
+			p = pfn_to_online_page(m->addr >> PAGE_SHIFT);
 			if (p)
 				SetPageHWPoison(p);
 		}
 	} else {


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ