linux-kernel - Re: [PATCH v3 01/24] x86/tdx: Enhance tdh_mem_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWb16XJuSVuyRu7l@yzhao56-desk.sh.intel.com>
Date: Wed, 14 Jan 2026 09:48:25 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: Ackerley Tng <ackerleytng@...gle.com>, Dave Hansen
	<dave.hansen@...el.com>, <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <kas@...nel.org>, <tabba@...gle.com>,
	<michael.roth@....com>, <david@...nel.org>, <sagis@...gle.com>,
	<vbabka@...e.cz>, <thomas.lendacky@....com>, <nik.borisov@...e.com>,
	<pgonda@...gle.com>, <fan.du@...el.com>, <jun.miao@...el.com>,
	<francescolavra.fl@...il.com>, <jgross@...e.com>, <ira.weiny@...el.com>,
	<isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>, <kai.huang@...el.com>,
	<binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>, <chao.gao@...el.com>
Subject: Re: [PATCH v3 01/24] x86/tdx: Enhance tdh_mem_page_aug() to support
 huge pages

On Tue, Jan 13, 2026 at 08:50:30AM -0800, Vishal Annapurve wrote:
> On Sun, Jan 11, 2026 at 6:44 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > > > The WARN_ON_ONCE() serves 2 purposes:
> > > > 1. Loudly warn of subtle KVM bugs.
> > > > 2. Ensure "page_to_pfn(base_page + i) == (page_to_pfn(base_page) + i)".
> > > >
> > >
> > > I disagree with checking within TDX code, but if you would still like to
> > > check, 2. that you suggested is less dependent on the concept of how the
> > > kernel groups pages in folios, how about:
> > >
> > >   WARN_ON_ONCE(page_to_pfn(base_page + npages - 1) !=
> > >                page_to_pfn(base_page) + npages - 1);
> > >
> > > The full contiguity check will scan every page, but I think this doesn't
> > > take too many CPU cycles, and would probably catch what you're looking
> > > to catch in most cases.
> > As Dave said,  "struct page" serves to guard against MMIO.
> >
> > e.g., with below memory layout, checking continuity of every PFN is still not
> > enough.
> >
> > PFN 0x1000: Normal RAM
> > PFN 0x1001: MMIO
> > PFN 0x1002: Normal RAM
> >
> 
> I don't see how guest_memfd memory can be interspersed with MMIO regions.
It's about API design.

When KVM invokes tdh_phymem_page_wbinvd_hkid(), passing "struct page *base_page"
and "unsigned long npages", WARN_ON_ONCE() in tdh_phymem_page_wbinvd_hkid() to
ensure those pages belong to a folio can effectively ensure they are physically
contiguous and do not contain MMIO.

Similar to "VM_WARN_ON_ONCE_FOLIO(!folio_test_large(folio), folio)" in
__folio_split().

Otherwise, why not just pass "pfn + npages" to tdh_phymem_page_wbinvd_hkid()?

> Is this in reference to the future extension to add private MMIO
> ranges? I think this discussion belongs in the context of TDX connect
> feature patches. I assume shared/private MMIO assignment to the guests
> will happen via completely different paths. And I would assume EPT
> entries will have information about whether the mapped ranges are MMIO
> or normal memory.
> 
> i.e. Anything mapped as normal memory in SEPT entries as a huge range
> should be safe to operate on without needing to cross-check sanity in
> the KVM TDX stack. If a hugerange has MMIO/normal RAM ranges mixed up
> then that is a much bigger problem.
> 
> > Also, is it even safe to reference struct page for PFN 0x1001 (e.g. with
> > SPARSEMEM without SPARSEMEM_VMEMMAP)?
> >
> > Leveraging folio makes it safe and simpler.
> > Since KVM also relies on folio size to determine mapping size, TDX doesn't
> > introduce extra limitations.
> >