linux-kernel - Re: [PATCH 2/2] KVM: TDX: Do not clear poisoned pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8c24d9b9c888eed972e8ee75fa9d31cc7fd72a73.camel@intel.com>
Date: Thu, 26 Jun 2025 01:19:54 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Hansen, Dave" <dave.hansen@...el.com>, "Hunter, Adrian"
	<adrian.hunter@...el.com>, "Annapurve, Vishal" <vannapurve@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "Luck, Tony"
	<tony.luck@...el.com>, "tony.lindgren@...ux.intel.com"
	<tony.lindgren@...ux.intel.com>, "Chatre, Reinette"
	<reinette.chatre@...el.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
	"mingo@...hat.com" <mingo@...hat.com>, "linux-edac@...r.kernel.org"
	<linux-edac@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "hpa@...or.com" <hpa@...or.com>, "Edgecombe,
 Rick P" <rick.p.edgecombe@...el.com>, "bp@...en8.de" <bp@...en8.de>, "Gao,
 Chao" <chao.gao@...el.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 2/2] KVM: TDX: Do not clear poisoned pages

On Wed, 2025-06-25 at 15:38 -0700, Dave Hansen wrote:
> On 6/25/25 15:32, Huang, Kai wrote:
> > > 2. page may have had an integrity violation or a hardware error
> > > (we can't tell which), and PageHWPoison(page) is true
> > Right.  I think the point of avoiding MOVDIR64B to such page is we cannot
> > tell whether it is a hardware error or not.  If it is a hardware error,
> > touching it using MOVDIR64B may cause additional #MC which will panic kernel
> > since now the #MC happens in the kernel context.
> 
> First and foremost, does the code path in question in this case touch
> userspace pages? Or pages that are only "kernel context" in the first place.

The #MC happens when in SEAM non-root mode, i.e., when TDX guest tries to
read it using its TDX private keyID, so practically it is a userspace page.

But for such #MC it doesn't matter because the machine check handler handles
#MC from SEAM non-root separately:

do_machine_check(...)
{
	...

        } else if (m->mcgstatus & MCG_STATUS_SEAM_NR) {
                /*
                 * Saved RIP on stack makes it look like the machine check 
                 * was taken in the kernel on the instruction following    
                 * the entry to SEAM mode. But MCG_STATUS_SEAM_NR indicates
                 * that the machine check was taken inside SEAM non-root   
                 * mode.  CPU core has already marked that guest as dead.  
                 * It is OK for the kernel to resume execution at the      
                 * apparent point of the machine check as the fault did    
                 * not occur there. Mark the page as poisoned so it won't  
                 * be added to free list when the guest is terminated.     
                 */             
                if (mce_usable_address(m)) {                               
                        struct page *p = pfn_to_online_page(m->addr >>
PAGE_SHIFT);                      
                                                                                                         
                        if (p)                                             
                                SetPageHWPoison(p);                        
                }
	...
}

However if the kernel touch the page again using MOVDIR64B, the further #MC
won't have MCG_STATUS_SEAM_NR bit set (because it doesn't happen in SEAM
non-root), therefore it will be treated as a normal kernel #MC which will
result in kernel panic.

> 
> Second, if we can't tell the difference between integrity violation or a
> hardware error and this code needs to clear "integrity violation" poison
> then won't this change just fundamentally break the erratum workaround
> in the first place?!?!

The existing upstream code clears "integrity violation" unconditionally,
which IIUC doesn't work if the page is poisoned due to "hardware error". 
This patch avoids clearing integrity violation to avoid this, only if the
page is marked as poisoned.

The erratum workaround (using MOVDIR64B to clear) works if the page is not
poisoned.  I don't think it will break the erratum workaround.

The whole assumption is if the page is marked as poisoned, the kernel will
never used it again.