linux-kernel - Re: [PATCH v4 06/16] x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <92f4d9766a5b381abeac6c4f6afcf0e4cba183d9.camel@intel.com>
Date: Thu, 11 Dec 2025 00:07:49 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Huang, Kai" <kai.huang@...el.com>, "Li,
 Xiaoyao" <xiaoyao.li@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>,
	"Zhao, Yan Y" <yan.y.zhao@...el.com>, "Wu, Binbin" <binbin.wu@...el.com>,
	"kas@...nel.org" <kas@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"mingo@...hat.com" <mingo@...hat.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "tglx@...utronix.de" <tglx@...utronix.de>, "Yamahata,
 Isaku" <isaku.yamahata@...el.com>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"Gao, Chao" <chao.gao@...el.com>, "bp@...en8.de" <bp@...en8.de>,
	"x86@...nel.org" <x86@...nel.org>
CC: "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH v4 06/16] x86/virt/tdx: Improve PAMT refcounts allocation
 for sparse memory

On Thu, 2025-11-27 at 09:36 +0200, Nikolay Borisov wrote:
> I agree with your analysis but this needs to be described not only in 
> the commit message but also as a code comment because you intentionally 
> omit locking since that particular pte (at that point) can only have a 
> single user so no race conditions are possible.

While commenting and writing up the log reasoning for why this is safe, I ended
up revisiting why we need to do all this manual PTE modification in the first
place. 

In Kiryl's v2 he had the vmalloc area allocated with VM_IOREMAP. Kai asked why,
and seemingly as a result, it was changed to VM_SPARSE. It turns out the
VM_SPARSE flag is a newer thing for dealing with sparsely populated vmalloc
address space mappings, exactly like we have here. It comes with some helpers
for mapping and unmapping sparse vmalloc ranges. So I played around with these a
bit.

The current apply_to_page_range() version is optimally efficient, but the
allocation is a one-time operation, and the freeing is actually only called in
an error path. I'm not sure it needs to be optimal.

If you use the helpers to populate, you pretty much just need to allocate all
the pages up front, and then call vm_area_map_pages() to map them. To unmap (the
error path), the code pattern is to iterate through the mapping a page at a
time, fetch the page with vmalloc_to_page(), unmap and free the page. At most
this is 2,097,152 iterations. Not fast, but not going to hang boot either.

So rather than the try to justify the locking, I'm thinking to go with a
stupider, simpler mapping/unmapping method that uses the ready made VM_SPARSE
helpers.

I hate to change things at this point, but I think the discussed reasoning is
going to beg the question of why it needs to be complicated.