lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <afddc163-4b1e-46ee-920a-85de3b347291@amazon.com>
Date: Mon, 26 Jan 2026 16:56:10 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Ackerley Tng <ackerleytng@...gle.com>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>, "linux-riscv@...ts.infradead.org"
	<linux-riscv@...ts.infradead.org>, "kalyazin@...zon.co.uk"
	<kalyazin@...zon.co.uk>, "kernel@...0n.name" <kernel@...0n.name>,
	"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-fsdevel@...r.kernel.org"
	<linux-fsdevel@...r.kernel.org>, "linux-s390@...r.kernel.org"
	<linux-s390@...r.kernel.org>, "kvmarm@...ts.linux.dev"
	<kvmarm@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>
CC: "david@...nel.org" <david@...nel.org>, "palmer@...belt.com"
	<palmer@...belt.com>, "catalin.marinas@....com" <catalin.marinas@....com>,
	"svens@...ux.ibm.com" <svens@...ux.ibm.com>, "jgross@...e.com"
	<jgross@...e.com>, "surenb@...gle.com" <surenb@...gle.com>,
	"riel@...riel.com" <riel@...riel.com>, "pfalcato@...e.de" <pfalcato@...e.de>,
	"peterx@...hat.com" <peterx@...hat.com>, "x86@...nel.org" <x86@...nel.org>,
	"rppt@...nel.org" <rppt@...nel.org>, "thuth@...hat.com" <thuth@...hat.com>,
	"maz@...nel.org" <maz@...nel.org>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "ast@...nel.org" <ast@...nel.org>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "borntraeger@...ux.ibm.com"
	<borntraeger@...ux.ibm.com>, "alex@...ti.fr" <alex@...ti.fr>,
	"pjw@...nel.org" <pjw@...nel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"willy@...radead.org" <willy@...radead.org>, "hca@...ux.ibm.com"
	<hca@...ux.ibm.com>, "wyihan@...gle.com" <wyihan@...gle.com>,
	"ryan.roberts@....com" <ryan.roberts@....com>, "jolsa@...nel.org"
	<jolsa@...nel.org>, "yang@...amperecomputing.com"
	<yang@...amperecomputing.com>, "jmattson@...gle.com" <jmattson@...gle.com>,
	"luto@...nel.org" <luto@...nel.org>, "aneesh.kumar@...nel.org"
	<aneesh.kumar@...nel.org>, "haoluo@...gle.com" <haoluo@...gle.com>,
	"patrick.roy@...ux.dev" <patrick.roy@...ux.dev>, "akpm@...ux-foundation.org"
	<akpm@...ux-foundation.org>, "coxu@...hat.com" <coxu@...hat.com>,
	"mhocko@...e.com" <mhocko@...e.com>, "mlevitsk@...hat.com"
	<mlevitsk@...hat.com>, "jgg@...pe.ca" <jgg@...pe.ca>, "hpa@...or.com"
	<hpa@...or.com>, "song@...nel.org" <song@...nel.org>, "oupton@...nel.org"
	<oupton@...nel.org>, "peterz@...radead.org" <peterz@...radead.org>,
	"maobibo@...ngson.cn" <maobibo@...ngson.cn>, "lorenzo.stoakes@...cle.com"
	<lorenzo.stoakes@...cle.com>, "Liam.Howlett@...cle.com"
	<Liam.Howlett@...cle.com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
	"martin.lau@...ux.dev" <martin.lau@...ux.dev>, "jhubbard@...dia.com"
	<jhubbard@...dia.com>, "Yu, Yu-cheng" <yu-cheng.yu@...el.com>,
	"Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>,
	"eddyz87@...il.com" <eddyz87@...il.com>, "yonghong.song@...ux.dev"
	<yonghong.song@...ux.dev>, "chenhuacai@...nel.org" <chenhuacai@...nel.org>,
	"shuah@...nel.org" <shuah@...nel.org>, "prsampat@....com" <prsampat@....com>,
	"kevin.brodsky@....com" <kevin.brodsky@....com>,
	"shijie@...amperecomputing.com" <shijie@...amperecomputing.com>,
	"suzuki.poulose@....com" <suzuki.poulose@....com>, "itazur@...zon.co.uk"
	<itazur@...zon.co.uk>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"yuzenghui@...wei.com" <yuzenghui@...wei.com>, "dev.jain@....com"
	<dev.jain@....com>, "gor@...ux.ibm.com" <gor@...ux.ibm.com>,
	"jackabt@...zon.co.uk" <jackabt@...zon.co.uk>, "daniel@...earbox.net"
	<daniel@...earbox.net>, "agordeev@...ux.ibm.com" <agordeev@...ux.ibm.com>,
	"andrii@...nel.org" <andrii@...nel.org>, "mingo@...hat.com"
	<mingo@...hat.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
	"joey.gouly@....com" <joey.gouly@....com>, "derekmn@...zon.com"
	<derekmn@...zon.com>, "xmarcalx@...zon.co.uk" <xmarcalx@...zon.co.uk>,
	"kpsingh@...nel.org" <kpsingh@...nel.org>, "sdf@...ichev.me"
	<sdf@...ichev.me>, "jackmanb@...gle.com" <jackmanb@...gle.com>,
	"bp@...en8.de" <bp@...en8.de>, "corbet@....net" <corbet@....net>,
	"jannh@...gle.com" <jannh@...gle.com>, "john.fastabend@...il.com"
	<john.fastabend@...il.com>, "kas@...nel.org" <kas@...nel.org>,
	"will@...nel.org" <will@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>
Subject: Re: [PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct
 map



On 22/01/2026 18:37, Ackerley Tng wrote:
> Nikita Kalyazin <kalyazin@...zon.com> writes:
> 
>> On 16/01/2026 00:00, Edgecombe, Rick P wrote:
>>> On Wed, 2026-01-14 at 13:46 +0000, Kalyazin, Nikita wrote:
>>>> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
>>>> +{
>>>> +     /*
>>>> +      * Direct map restoration cannot fail, as the only error condition
>>>> +      * for direct map manipulation is failure to allocate page tables
>>>> +      * when splitting huge pages, but this split would have already
>>>> +      * happened in folio_zap_direct_map() in kvm_gmem_folio_zap_direct_map().
> 
> Do you know if folio_restore_direct_map() will also end up merging page
> table entries to a higher level?
> 
>>>> +      * Thus folio_restore_direct_map() here only updates prot bits.
>>>> +      */
>>>> +     if (kvm_gmem_folio_no_direct_map(folio)) {
>>>> +             WARN_ON_ONCE(folio_restore_direct_map(folio));
>>>> +             folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP);
>>>> +     }
>>>> +}
>>>> +
>>>
>>> Does this assume the folio would not have been split after it was zapped? As in,
>>> if it was zapped at 2MB granularity (no 4KB direct map split required) but then
>>> restored at 4KB (split required)? Or it gets merged somehow before this?
> 
> I agree with the rest of the discussion that this will probably land
> before huge page support, so I will have to figure out the intersection
> of the two later.
> 
>>
>> AFAIK it can't be zapped at 2MB granularity as the zapping code will
>> inevitably cause splitting because guest_memfd faults occur at the base
>> page granularity as of now.
> 
> Here's what I'm thinking for now:
> 
> [HugeTLB, no conversions]
> With initial HugeTLB support (no conversions), host userspace
> guest_memfd faults will be:
> 
> + For guest_memfd with PUD-sized pages
>      + At PUD level or PTE level
> + For guest_memfd with PMD-sized pages
>      + At PMD level or PTE level
> 
> Since this guest_memfd doesn't support conversions, the folio is never
> split/merged, so the direct map is restored at whatever level it was
> zapped. I think this works out well.
> 
> [HugeTLB + conversions]
> For a guest_memfd with HugeTLB support and conversions, host userspace
> guest_memfd faults will always be at PTE level, so the direct map will
> be split and the faulted pages have the direct map zapped in 4K chunks
> as they are faulted.
> 
> On conversion back to private, put those back into the direct map
> (putting aside whether to merge the direct map PTEs for now).

Makes sense to me.

> 
> 
> Unfortunately there's no unmapping callback for guest_memfd to use, so
> perhaps the principle should be to put the folios back into the direct
> map ASAP - at unmapping if guest_memfd is doing the unmapping, otherwise
> at freeing time?

I'm not sure I fully understand what you mean here.  What would be the 
purpose for hooking up to unmapping?  Why would making sure we put 
folios back into the direct map whenever they are freed or converted to 
private not be sufficient?


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ