lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWqYlXg4CS6vxS-o@google.com>
Date: Fri, 16 Jan 2026 11:59:17 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Yan Zhao <yan.y.zhao@...el.com>, Ackerley Tng <ackerleytng@...gle.com>, 
	Vishal Annapurve <vannapurve@...gle.com>, pbonzini@...hat.com, linux-kernel@...r.kernel.org, 
	kvm@...r.kernel.org, x86@...nel.org, rick.p.edgecombe@...el.com, 
	kas@...nel.org, tabba@...gle.com, michael.roth@....com, david@...nel.org, 
	sagis@...gle.com, vbabka@...e.cz, thomas.lendacky@....com, 
	nik.borisov@...e.com, pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com, 
	francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com, 
	binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

On Fri, Jan 16, 2026, Dave Hansen wrote:
> On 1/16/26 09:14, Sean Christopherson wrote:
> > If you want to assert that the pfn is compatible with TDX, then by
> > all means.  But I am NOT accepting any more KVM code that assumes
> > TDX memory is backed by refcounted struct page.  If I had been
> > paying more attention when the initial TDX series landed, I would
> > have NAK'd that too.
> I'm kinda surprised by that. The only memory we support handing into TDs
> for private memory is refcounted struct page. I can imagine us being
> able to do this with DAX pages in the near future, but those have
> 'struct page' too, and I think they're refcounted pretty normally now as
> well.
> 
> The TDX module initialization is pretty tied to NUMA nodes, too. If it's
> in a NUMA node, the TDX module is told about it and it also universally
> gets a 'struct page'.
> 
> Is there some kind of memory that I'm missing? What else *is* there? :)

I don't want to special case TDX on the backend of KVM's MMU.  There's already
waaaay too much code and complexity in KVM that exists purely for S-EPT.  Baking
in assumptions on how _exactly_ KVM is managing guest memory goes too far.

The reason I'm so hostile towards struct page is that, as evidenced by this series
and a ton of historical KVM code, assuming that memory is backed by struct page is
a _very_ slippery slope towards code that is extremely nasty to unwind later on.

E.g. see all of the effort that ended up going into commit ce7b5695397b ("KVM: TDX:
Drop superfluous page pinning in S-EPT management").  And in this series, the
constraints that will be placed on guest_memfd if TDX assumes hugepages will always
be covered in a single folio.  Untangling KVM's historical (non-TDX) messes around
struct page took us something like two years.

And so to avoid introducing similar messes in the future, I don't want KVM's MMU
to make _any_ references to struct page when it comes to mapping memory into the
guest unless it's absolutely necessary, e.g. to put a reference when KVM _knows_
it acquired a refcounted page via gup() (and ideally we'd kill even that, e.g.
by telling gup() not to bump the refcount in the first place).

> > tdh_mem_page_aug() is just an absurdly slow way of writing a PTE.  It doesn't
> > _need_ the pfn to be backed a struct page, at all.  IMO, what you're asking for
> > is akin to adding a pile of unnecessary assumptions to e.g. __set_spte() and
> > __kvm_tdp_mmu_write_spte().  No thanks.
> 
> Which part is absurdly slow?

The SEAMCALL itself.  I'm saying that TDH_MEM_PAGE_AUG is really just the S-EPT
version of "make this PTE PRESENT", and that piling on sanity checks that aren't
fundamental to TDX shouldn't be done when KVM is writing PTEs.

In other words, something like this is totally fine:

	KVM_MMU_WARN_ON(!tdx_is_convertible_pfn(pfn));

but this is not:

	WARN_ON_ONCE(!page_mapping(pfn_to_page(pfn)));

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ