lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250819172846.GA578379@yaz-khff2.amd.com>
Date: Tue, 19 Aug 2025 13:28:46 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
	Tony Luck <tony.luck@...el.com>, pbonzini@...hat.com,
	seanjc@...gle.com, vannapurve@...gle.com,
	Borislav Petkov <bp@...en8.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
	H Peter Anvin <hpa@...or.com>, linux-edac@...r.kernel.org,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
	rick.p.edgecombe@...el.com, kai.huang@...el.com,
	reinette.chatre@...el.com, xiaoyao.li@...el.com,
	tony.lindgren@...ux.intel.com, binbin.wu@...ux.intel.com,
	ira.weiny@...el.com, isaku.yamahata@...el.com,
	Fan Du <fan.du@...el.com>, yan.y.zhao@...el.com, chao.gao@...el.com
Subject: Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in
 recovery for errors in TDX/SEAM non-root mode

On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
> valid physical address bits within the machine check bank address register.
> 
> This is particularly needed in the case of errors in TDX/SEAM non-root mode
> because the reported address contains the TDX KeyID.  Refer to TDX and
> TME-MK documentation for more information about KeyIDs.
> 
> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
> non-root mode") uses the address to mark the affected page as poisoned, but
> omits to use the aforementioned mask.
> 
> Investigation of user space expectations has concluded it would be more
> correct for the address to contain only address bits in the first place.
> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
> 
> Mask the address when it is read from the machine check bank address
> register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
> later patch.
> 
> It is assumed __log_error() in arch/x86/kernel/cpu/mce/amd.c does not need
> similar treatment.
> 
> Amend struct mce addr member description slightly to reflect that it is
> not, and never has been, an exact copy of the bank's MCi_ADDR MSR.
> 

I think it would be more accurate to say that the MCi_ADDR MSR is not,
and never has been, guaranteed to be a system physical address.

We could introduce a new field that represents the system physical
address, if one exists for the error type. This way we can operate on a
value without assumption or additional checks. And we can keep the raw
MCi_ADDR MSR value in case it is of value to debug folks or hardware
designers. In my experience, they seem to appreciate having the full,
unfiltered data. We don't give them that today, but we can work towards
that goal.

I have some old work in this area:
https://github.com/AMDESE/linux/commit/76732c67cbf96c14f55ed1061804db9ff1505ea3

This isn't a quick fix, so maybe we can come back to it if folks are
happy with your current solution.

But I do think there's value in sharing the data as given to us by
hardware. And providing new interfaces to users if we need to modify
something for them to take action.

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ