[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWBxFXYPzWnkubNH@yzhao56-desk.sh.intel.com>
Date: Fri, 9 Jan 2026 11:08:05 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
<rick.p.edgecombe@...el.com>, <kas@...nel.org>, <tabba@...gle.com>,
<ackerleytng@...gle.com>, <michael.roth@....com>, <david@...nel.org>,
<vannapurve@...gle.com>, <sagis@...gle.com>, <vbabka@...e.cz>,
<thomas.lendacky@....com>, <nik.borisov@...e.com>, <pgonda@...gle.com>,
<fan.du@...el.com>, <jun.miao@...el.com>, <francescolavra.fl@...il.com>,
<jgross@...e.com>, <ira.weiny@...el.com>, <isaku.yamahata@...el.com>,
<xiaoyao.li@...el.com>, <kai.huang@...el.com>, <binbin.wu@...ux.intel.com>,
<chao.p.peng@...el.com>, <chao.gao@...el.com>
Subject: Re: [PATCH v3 01/24] x86/tdx: Enhance tdh_mem_page_aug() to support
huge pages
On Wed, Jan 07, 2026 at 08:39:55AM -0800, Dave Hansen wrote:
> On 1/7/26 01:12, Yan Zhao wrote:
> ...
> > However, my understanding is that it's better for functions expecting huge pages
> > to explicitly receive "folio" instead of "page". This way, people can tell from
> > a function's declaration what the function expects. Is this understanding
> > correct?
>
> In a perfect world, maybe.
>
> But, in practice, a 'struct page' can still represent huge pages and
> *does* represent huge pages all over the kernel. There's no need to cram
> a folio in here just because a huge page is involved.
Ok. I can modify the param "struct page *page" to "struct page *base_page",
explaining that it may belong to a huge folio but is not necessarily the
head page of the folio.
> > Passing "start_idx" along with "folio" is due to the requirement of mapping only
> > a sub-range of a huge folio. e.g., we allow creating a 2MB mapping starting from
> > the nth idx of a 1GB folio.
> >
> > On the other hand, if we instead pass "page" to tdh_mem_page_aug() for huge
> > pages and have tdh_mem_page_aug() internally convert it to "folio" and
> > "start_idx", it makes me wonder if we could have previously just passed "pfn" to
> > tdh_mem_page_aug() and had tdh_mem_page_aug() convert it to "page".
>
> As a general pattern, I discourage folks from using pfns and physical
> addresses when passing around references to physical memory. They have
> zero type safety.
>
> It's also not just about type safety. A 'struct page' also *means*
> something. It means that the kernel is, on some level, aware of and
> managing that memory. It's not MMIO. It doesn't represent the physical
> address of the APIC page. It's not SGX memory. It doesn't have a
> Shared/Private bit.
>
> All of those properties are important and they're *GONE* if you use a
> pfn. It's even worse if you use a raw physical address.
>
> Please don't go back to raw integers (pfns or paddrs).
I understood and fully accept it.
I previously wondered if we could allow KVM to pass in pfn and let the SEAMCALL
wrapper do the pfn_to_page() conversion.
But it was just out of curiosity. I actually prefer "struct page" too.
> >>> - tdx_clflush_page(page);
> >>> + if (start_idx + npages > folio_nr_pages(folio))
> >>> + return TDX_OPERAND_INVALID;
> >>
> >> Why is this necessary? Would it be a bug if this happens?
> > This sanity check is due to the requirement in KVM that mapping size should be
> > no larger than the backend folio size, which ensures the mapping pages are
> > physically contiguous with homogeneous page attributes. (See the discussion
> > about "EPT mapping size and folio size" in thread [1]).
> >
> > Failure of the sanity check could only be due to bugs in the caller (KVM). I
> > didn't convert the sanity check to an assertion because there's already a
> > TDX_BUG_ON_2() on error following the invocation of tdh_mem_page_aug() in KVM.
>
> We generally don't protect against bugs in callers. Otherwise, we'd have
> a trillion NULL checks in every function in the kernel.
>
> The only reason to add caller sanity checks is to make things easier to
> debug, and those almost always include some kind of spew:
> WARN_ON_ONCE(), pr_warn(), etc...
Would it be better if I use WARN_ON_ONCE()? like this:
u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *base_page,
u64 *ext_err1, u64 *ext_err2)
{
unsigned long npages = tdx_sept_level_to_npages(level);
struct tdx_module_args args = {
.rcx = gpa | level,
.rdx = tdx_tdr_pa(td),
.r8 = page_to_phys(base_page),
};
u64 ret;
WARN_ON_ONCE(page_folio(base_page) != page_folio(base_page + npages - 1));
for (int i = 0; i < npages; i++)
tdx_clflush_page(base_page + i);
ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
*ext_err1 = args.rcx;
*ext_err2 = args.rdx;
return ret;
}
The WARN_ON_ONCE() serves 2 purposes:
1. Loudly warn of subtle KVM bugs.
2. Ensure "page_to_pfn(base_page + i) == (page_to_pfn(base_page) + i)".
If you don't like using "base_page + i" (as the discussion in v2 [1]), we can
invoke folio_page() for the ith page instead.
[1] https://lore.kernel.org/all/01731a9a0346b08577fad75ae560c650145c7f39.camel@intel.com/
> >>> + for (int i = 0; i < npages; i++)
> >>> + tdx_clflush_page(folio_page(folio, start_idx + i));
> >>
> >> All of the page<->folio conversions are kinda hurting my brain. I think
> >> we need to decide what the canonical type for these things is in TDX, do
> >> the conversion once, and stick with it.
> > Got it!
> >
> > Since passing in base "page" or base "pfn" may still require the
> > wrappers/helpers to internally convert them to "folio" for sanity checks, could
> > we decide that "folio" and "start_idx" are the canonical params for functions
> > expecting huge pages? Or do you prefer KVM to do the sanity check by itself?
>
> I'm not convinced the sanity check is a good idea in the first place. It
> just adds complexity.
I'm worried about subtle bugs introduced by careless coding that might be
silently ignored otherwise, like the one in thread [2].
[2] https://lore.kernel.org/kvm/aV2A39fXgzuM4Toa@google.com/
Powered by blists - more mailing lists