linux-kernel - Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWkVLViKBgiVGgaI@google.com>
Date: Thu, 15 Jan 2026 08:26:21 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, Vishal Annapurve <vannapurve@...gle.com>, pbonzini@...hat.com, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org, 
	rick.p.edgecombe@...el.com, dave.hansen@...el.com, kas@...nel.org, 
	tabba@...gle.com, michael.roth@....com, david@...nel.org, sagis@...gle.com, 
	vbabka@...e.cz, thomas.lendacky@....com, nik.borisov@...e.com, 
	pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com, 
	francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com, 
	binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

On Thu, Jan 15, 2026, Yan Zhao wrote:
> On Wed, Jan 14, 2026 at 07:26:44AM -0800, Sean Christopherson wrote:
> > Ok, with the disclaimer that I hadn't actually looked at the patches in this
> > series before now...
> > 
> > TDX absolutely should not be doing _anything_ with folios.  I am *very* strongly
> > opposed to TDX assuming that memory is backed by refcounted "struct page", and
> > thus can use folios to glean the maximum mapping size.
> > 
> > guest_memfd is _the_ owner of that information.  guest_memfd needs to explicitly
> > _tell_ the rest of KVM what the maximum mapping size is; arch code should not
> > infer that size from a folio.
> > 
> > And that code+behavior already exists in the form of kvm_gmem_mapping_order() and
> > its users, _and_ is plumbed all the way into tdx_mem_page_aug() as @level.  IIUC,
> > the _only_ reason tdx_mem_page_aug() retrieves the page+folio is because
> > tdx_clflush_page() ultimately requires a "struct page".  That is absolutely
> > ridiculous and not acceptable.  CLFLUSH takes a virtual address, there is *zero*
> > reason tdh_mem_page_aug() needs to require/assume a struct page.
> Not really.
> 
> Per my understanding, tdx_mem_page_aug() requires "struct page" (and checks
> folios for huge pages) because the SEAMCALL wrapper APIs are not currently built
> into KVM. Since they may have callers other than KVM, some sanity checking in
> case the caller does something incorrect seems necessary (e.g., in case the
> caller provides an out-of-range struct page or a page with !pfn_valid() PFN).

As a mentioned in my reply to Dave, I don't object to reasonable sanity checks.

> This is similar to "VM_WARN_ON_ONCE_FOLIO(!folio_test_large(folio), folio)" in
> __folio_split().

No, it's not.  __folio_split() is verifying that the input for the exact one thing
it's doing, splitting a huge folio, matches what the function is being asked to do.

TDX requiring guest_memfd to back everything with struct page, and to only use
single, huge folios to map hugepages into the guest is making completely unnecessary
about guest_memfd and KVM MMU implementation details.

> With tdx_mem_page_aug() ensuring pages validity and contiguity,

It absolutely does not.

 - If guest_memfd unmaps the direct map[*], CLFLUSH will fault and panic the
   kernel.
 - If the PFN isn't backed by struct page, tdx_mem_page_aug() will hit a NULL
   pointer deref.
 - If the PFN is back by struct page, but the page is managed by something other
   than guest_memfd or core MM, all bets are off.

[*] https://lore.kernel.org/all/20260114134510.1835-1-kalyazin@amazon.com

> invoking local static function tdx_clflush_page() page-per-page looks good to
> me.  Alternatively, we could convert tdx_clflush_page() to
> tdx_clflush_cache_range(), which receives VA.
> 
> However, I'm not sure if my understanding is correct now, especially since it
> seems like everyone thinks the SEAMCALL wrapper APIs should trust the caller,
> assuming they are KVM-specific.

It's all kernel code.  Implying that KVM is somehow untrusted is absurd.