linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0072a5c0cf289b3ba4d209c9c36f54728041e12d.camel@intel.com>
Date: Wed, 18 Jun 2025 00:41:38 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Zhao, Yan Y" <yan.y.zhao@...el.com>
CC: "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Hansen, Dave" <dave.hansen@...el.com>, "david@...hat.com"
	<david@...hat.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "tabba@...gle.com" <tabba@...gle.com>,
	"Du, Fan" <fan.du@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Weiny, Ira" <ira.weiny@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"michael.roth@....com" <michael.roth@....com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "Peng, Chao P" <chao.p.peng@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun"
	<jun.miao@...el.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Wed, 2025-06-18 at 08:19 +0800, Yan Zhao wrote:
> > I don't think a potential bug in KVM is a good enough reason. If we are
> > concerned can we think about a warning instead?
> > 
> > We had talked enhancing kasan to know when a page is mapped into S-EPT in
> > the
> > past. So rather than design around potential bugs we could focus on having a
> > simpler implementation with the infrastructure to catch and fix the bugs.
> However, if failing to remove a guest private page would only cause memory
> leak,
> it's fine. 
> If TDX does not hold any refcount, guest_memfd has to know that which private
> page is still mapped. Otherwise, the page may be re-assigned to other kernel
> components while it may still be mapped in the S-EPT.

KASAN detects use-after-free's like that. However, the TDX module code is not
instrumented. It won't check against the KASAN state for it's accesses.

I had a brief chat about this with Dave and Kirill. A couple ideas were
discussed. One was to use page_ext to keep a flag that says the page is in-use
by the TDX module. There was also some discussion of using a normal page flag,
and that the reserved page flag might prevent some of the MM operations that
would be needed on guestmemfd pages. I didn't see the problem when I looked.

For the solution, basically the SEAMCALL wrappers set a flag when they hand a
page to the TDX module, and clear it when they successfully reclaim it via
tdh_mem_page_remove() or tdh_phymem_page_reclaim(). Then if the page makes it
back to the page allocator, a warning is generated.

Also it was mentioned that SGX did have a similar issue to what is being worried
about here:
https://lore.kernel.org/linux-sgx/aCYey1W6i7i3yPLL@gmail.com/T/#m86c8c4cf0e6b9a653bf0709a22bb360034a24d95

> 
> 
> > > 
> > > > > 
> > > > > This would allow guest_memfd to maintain an internal reference count
> > > > > for
> > > > > each
> > > > > private GFN. TDX would call guest_memfd_add_page_ref_count() for
> > > > > mapping
> > > > > and
> > > > > guest_memfd_dec_page_ref_count() after a successful unmapping. Before
> > > > > truncating
> > > > > a private page from the filemap, guest_memfd could increase the real
> > > > > folio
> > > > > reference count based on its internal reference count for the private
> > > > > GFN.
> > > > 
> > > > What does this get us exactly? This is the argument to have less error
> > > > prone
> > > > code that can survive forgetting to refcount on error? I don't see that
> > > > it
> > > > is an
> > > > especially special case.
> > > Yes, for a less error prone code.
> > > 
> > > If this approach is considered too complex for an initial implementation,
> > > using
> > > tdx_hold_page_on_error() is also a viable option.
> > 
> > I'm saying I don't think it's not a good enough reason. Why is it different
> > then
> > other use-after free bugs? I feel like I'm missing something.
> By tdx_hold_page_on_error(), it could be implememented as on removal failure,
> invoke a guest_memfd interface to let guest_memfd know exact ranges still
> being
> under use by the TDX module due to unmapping failures.
> Do you think it's ok?

Either way is ok to me. It seems like we have three ok solutions. But the tone
of the thread is that we are solving some deep problem. Maybe I'm missing
something.