[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEvNRgEA69UL_=T+vE6z0wxNf59ie9neSbpyrp58_784C8vL9w@mail.gmail.com>
Date: Thu, 8 Jan 2026 11:05:45 -0800
From: Ackerley Tng <ackerleytng@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>, Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, seanjc@...gle.com, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, x86@...nel.org, rick.p.edgecombe@...el.com,
kas@...nel.org, tabba@...gle.com, michael.roth@....com, david@...nel.org,
vannapurve@...gle.com, sagis@...gle.com, vbabka@...e.cz,
thomas.lendacky@....com, nik.borisov@...e.com, pgonda@...gle.com,
fan.du@...el.com, jun.miao@...el.com, francescolavra.fl@...il.com,
jgross@...e.com, ira.weiny@...el.com, isaku.yamahata@...el.com,
xiaoyao.li@...el.com, kai.huang@...el.com, binbin.wu@...ux.intel.com,
chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 01/24] x86/tdx: Enhance tdh_mem_page_aug() to support
huge pages
Dave Hansen <dave.hansen@...el.com> writes:
> On 1/7/26 01:12, Yan Zhao wrote:
> ...
>> However, my understanding is that it's better for functions expecting huge pages
>> to explicitly receive "folio" instead of "page". This way, people can tell from
>> a function's declaration what the function expects. Is this understanding
>> correct?
>
> In a perfect world, maybe.
>
> But, in practice, a 'struct page' can still represent huge pages and
> *does* represent huge pages all over the kernel. There's no need to cram
> a folio in here just because a huge page is involved.
>
>> Passing "start_idx" along with "folio" is due to the requirement of mapping only
>> a sub-range of a huge folio. e.g., we allow creating a 2MB mapping starting from
>> the nth idx of a 1GB folio.
>>
>> On the other hand, if we instead pass "page" to tdh_mem_page_aug() for huge
>> pages and have tdh_mem_page_aug() internally convert it to "folio" and
>> "start_idx", it makes me wonder if we could have previously just passed "pfn" to
>> tdh_mem_page_aug() and had tdh_mem_page_aug() convert it to "page".
>
> As a general pattern, I discourage folks from using pfns and physical
> addresses when passing around references to physical memory. They have
> zero type safety.
>
> It's also not just about type safety. A 'struct page' also *means*
> something. It means that the kernel is, on some level, aware of and
> managing that memory. It's not MMIO. It doesn't represent the physical
> address of the APIC page. It's not SGX memory. It doesn't have a
> Shared/Private bit.
>
I agree that the use of struct pages is better than the use of struct
folios. I think the use of folios unnecessarily couples low level TDX
code to memory metadata (pages and folios) in the kernel.
> All of those properties are important and they're *GONE* if you use a
> pfn. It's even worse if you use a raw physical address.
>
We were thinking through what it would take to have TDs use VM_PFNMAP
memory, where the memory may not actually have associated struct
pages. Without further work, having struct pages in the TDX interface
would kind of lock out those sources of memory. Is TDX open to using
non-kernel managed memory?
> Please don't go back to raw integers (pfns or paddrs).
>
I guess what we're all looking for is a type representing regular memory
(to exclude MMIO/APIC pages/SGX/etc) but isn't limited to memory the
kernel.
Perhaps the best we have now is still pfn/paddrs + nr_pages, and having
the callers of TDX functions handle/ensure the checking required to
exclude unsupported types of memory.
For type safety, would phyrs help? [1] Perhaps starting with pfn/paddrs
+ nr_pages would allow transitioning to phyrs later. Using pages would
be okay for now, but I would rather not use folios.
[1] https://lore.kernel.org/all/YdyKWeU0HTv8m7wD@casper.infradead.org/
>>>> - tdx_clflush_page(page);
>>>> + if (start_idx + npages > folio_nr_pages(folio))
>>>> + return TDX_OPERAND_INVALID;
>>>
>>> Why is this necessary? Would it be a bug if this happens?
>> This sanity check is due to the requirement in KVM that mapping size should be
>> no larger than the backend folio size, which ensures the mapping pages are
>> physically contiguous with homogeneous page attributes. (See the discussion
>> about "EPT mapping size and folio size" in thread [1]).
>>
>> Failure of the sanity check could only be due to bugs in the caller (KVM). I
>> didn't convert the sanity check to an assertion because there's already a
>> TDX_BUG_ON_2() on error following the invocation of tdh_mem_page_aug() in KVM.
>
> We generally don't protect against bugs in callers. Otherwise, we'd have
> a trillion NULL checks in every function in the kernel.
>
> The only reason to add caller sanity checks is to make things easier to
> debug, and those almost always include some kind of spew:
> WARN_ON_ONCE(), pr_warn(), etc...
>
>>>> + for (int i = 0; i < npages; i++)
>>>> + tdx_clflush_page(folio_page(folio, start_idx + i));
>>>
>>> All of the page<->folio conversions are kinda hurting my brain. I think
>>> we need to decide what the canonical type for these things is in TDX, do
>>> the conversion once, and stick with it.
>> Got it!
>>
>> Since passing in base "page" or base "pfn" may still require the
>> wrappers/helpers to internally convert them to "folio" for sanity checks, could
>> we decide that "folio" and "start_idx" are the canonical params for functions
>> expecting huge pages? Or do you prefer KVM to do the sanity check by itself?
>
> I'm not convinced the sanity check is a good idea in the first place. It
> just adds complexity.
Powered by blists - more mailing lists