linux-kernel - Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <408079db-c488-492e-b6e7-063dea3cb861@intel.com>
Date: Wed, 3 Dec 2025 12:13:40 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
 "nik.borisov@...e.com" <nik.borisov@...e.com>,
 "kas@...nel.org" <kas@...nel.org>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
 "Li, Xiaoyao" <xiaoyao.li@...el.com>, "Huang, Kai" <kai.huang@...el.com>,
 "linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
 "Zhao, Yan Y" <yan.y.zhao@...el.com>, "Wu, Binbin" <binbin.wu@...el.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "seanjc@...gle.com" <seanjc@...gle.com>, "mingo@...hat.com"
 <mingo@...hat.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "Yamahata, Isaku" <isaku.yamahata@...el.com>,
 "Annapurve, Vishal" <vannapurve@...gle.com>, "Gao, Chao"
 <chao.gao@...el.com>, "bp@...en8.de" <bp@...en8.de>,
 "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers

On 12/3/25 11:59, Edgecombe, Rick P wrote:
> On Wed, 2025-12-03 at 10:21 -0800, Dave Hansen wrote:
>>> Thanks Dave. Yes, let's stick to the spec. I'm going to try to pull the
>>> loops
>>> out too because we can get rid of the union array thing too.
>>
>> Also, I honestly don't see the problem with just allocating an order-1
>> page for this. Yeah, the TDX modules doesn't need physically contiguous
>> pages, but it's easier for _us_ to lug them around if they are
>> physically contiguous.
> 
> We have two spin locks to contend with for these allocations. One is the global
> spin lock on the arch/x86 side. In this case, the the pages don't have to be
> passed far, like:
> 
> tdx_pamt_get(some_page, NULL)
> 	page1 = alloc()
> 	page2 = alloc()
> 
> 	scoped_guard(spinlock, &pamt_lock) {
> 		tdh_phymem_pamt_add(.., page1, page2)
> 			/* Pack into struct */
> 			seamcall()
> 	}
> 
> I think it's not too bad?

No, that's not bad.

The thing I thought was annoying was in the past when there were a bunch
of functions with two explicit page pointers plumbed in to them.

> So if we decide to pass a single order-1 page into tdx_pamt_get() instead of
> order_0_cache, we can stop passing the cache between KVM and arch/x86, but we
> then need two cache's instead of one. One for order-0 S-EPT page tables and one
> for order-1 DPAMT page pairs.
> 
> Also, if we have to allocate the order-1 page in each caller, it simplifies the
> arch/x86 code, but duplicates the allocation in the KVM callers (only 2 today
> though).
> 
> So I'm suspicious it's not going to be a big win, but I'll give it a try.

Yeah, the value of doing order-1 is super low if it means managing a
second cache.

>> Plus, if you permanently allocate 2 order-0 pages, you are _probably_
>> going to permanently destroy 2 potential future 2MB pages. The order-1
>> allocation will only destroy 1.
> 
> Doesn't the buddy allocator try to avoid splitting larger blocks? I guess you
> mean in the worst case, but the DPAMT should also not be allocated forever
> either. So I think it's only at the intersection of two worst cases? Worth it?

It's not splitting them in this case. They're *already* split:

# cat /proc/buddyinfo
...
Node 0, zone   Normal  32903  33566 ...

See, there are already ~33,000 4k pages sitting there. Those will be
consumed first on any 4k allocation. So, yeah, it'll avoid splitting an
8k block to get 4k pages normally.

BTW, I *DO* expect the DPAMT pages to be mostly allocated forever. Maybe
I'm just a pessimist, but you can't get them back for compaction or
reclaim, so they're basically untouchable. Sure, if you kill all the TDX
guests you get them back, but that's a very different kind of kernel
memory from stuff that's truly reclaimable under pressure.