[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEvNRgEvd9tSwrkaYrQyibO2DP99vgVj6_zr=jBH5+zMnJwYbA@mail.gmail.com>
Date: Thu, 22 Jan 2026 10:37:37 -0800
From: Ackerley Tng <ackerleytng@...gle.com>
To: kalyazin@...zon.com, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
"kalyazin@...zon.co.uk" <kalyazin@...zon.co.uk>, "kernel@...0n.name" <kernel@...0n.name>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
"kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>
Cc: "david@...nel.org" <david@...nel.org>, "palmer@...belt.com" <palmer@...belt.com>,
"catalin.marinas@....com" <catalin.marinas@....com>, "svens@...ux.ibm.com" <svens@...ux.ibm.com>,
"jgross@...e.com" <jgross@...e.com>, "surenb@...gle.com" <surenb@...gle.com>,
"riel@...riel.com" <riel@...riel.com>, "pfalcato@...e.de" <pfalcato@...e.de>,
"peterx@...hat.com" <peterx@...hat.com>, "x86@...nel.org" <x86@...nel.org>, "rppt@...nel.org" <rppt@...nel.org>,
"thuth@...hat.com" <thuth@...hat.com>, "maz@...nel.org" <maz@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "ast@...nel.org" <ast@...nel.org>,
"vbabka@...e.cz" <vbabka@...e.cz>, "Annapurve, Vishal" <vannapurve@...gle.com>,
"borntraeger@...ux.ibm.com" <borntraeger@...ux.ibm.com>, "alex@...ti.fr" <alex@...ti.fr>,
"pjw@...nel.org" <pjw@...nel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
"willy@...radead.org" <willy@...radead.org>, "hca@...ux.ibm.com" <hca@...ux.ibm.com>,
"wyihan@...gle.com" <wyihan@...gle.com>, "ryan.roberts@....com" <ryan.roberts@....com>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"yang@...amperecomputing.com" <yang@...amperecomputing.com>, "jmattson@...gle.com" <jmattson@...gle.com>,
"luto@...nel.org" <luto@...nel.org>, "aneesh.kumar@...nel.org" <aneesh.kumar@...nel.org>,
"haoluo@...gle.com" <haoluo@...gle.com>, "patrick.roy@...ux.dev" <patrick.roy@...ux.dev>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "coxu@...hat.com" <coxu@...hat.com>,
"mhocko@...e.com" <mhocko@...e.com>, "mlevitsk@...hat.com" <mlevitsk@...hat.com>, "jgg@...pe.ca" <jgg@...pe.ca>,
"hpa@...or.com" <hpa@...or.com>, "song@...nel.org" <song@...nel.org>, "oupton@...nel.org" <oupton@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>, "maobibo@...ngson.cn" <maobibo@...ngson.cn>,
"lorenzo.stoakes@...cle.com" <lorenzo.stoakes@...cle.com>,
"Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
"martin.lau@...ux.dev" <martin.lau@...ux.dev>, "jhubbard@...dia.com" <jhubbard@...dia.com>,
"Yu, Yu-cheng" <yu-cheng.yu@...el.com>,
"Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>, "eddyz87@...il.com" <eddyz87@...il.com>,
"yonghong.song@...ux.dev" <yonghong.song@...ux.dev>, "chenhuacai@...nel.org" <chenhuacai@...nel.org>,
"shuah@...nel.org" <shuah@...nel.org>, "prsampat@....com" <prsampat@....com>,
"kevin.brodsky@....com" <kevin.brodsky@....com>,
"shijie@...amperecomputing.com" <shijie@...amperecomputing.com>,
"suzuki.poulose@....com" <suzuki.poulose@....com>, "itazur@...zon.co.uk" <itazur@...zon.co.uk>,
"pbonzini@...hat.com" <pbonzini@...hat.com>, "yuzenghui@...wei.com" <yuzenghui@...wei.com>,
"dev.jain@....com" <dev.jain@....com>, "gor@...ux.ibm.com" <gor@...ux.ibm.com>,
"jackabt@...zon.co.uk" <jackabt@...zon.co.uk>, "daniel@...earbox.net" <daniel@...earbox.net>,
"agordeev@...ux.ibm.com" <agordeev@...ux.ibm.com>, "andrii@...nel.org" <andrii@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>,
"joey.gouly@....com" <joey.gouly@....com>, "derekmn@...zon.com" <derekmn@...zon.com>,
"xmarcalx@...zon.co.uk" <xmarcalx@...zon.co.uk>, "kpsingh@...nel.org" <kpsingh@...nel.org>,
"sdf@...ichev.me" <sdf@...ichev.me>, "jackmanb@...gle.com" <jackmanb@...gle.com>, "bp@...en8.de" <bp@...en8.de>,
"corbet@....net" <corbet@....net>, "jannh@...gle.com" <jannh@...gle.com>,
"john.fastabend@...il.com" <john.fastabend@...il.com>, "kas@...nel.org" <kas@...nel.org>,
"will@...nel.org" <will@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>
Subject: Re: [PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct map
Nikita Kalyazin <kalyazin@...zon.com> writes:
> On 16/01/2026 00:00, Edgecombe, Rick P wrote:
>> On Wed, 2026-01-14 at 13:46 +0000, Kalyazin, Nikita wrote:
>>> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
>>> +{
>>> + /*
>>> + * Direct map restoration cannot fail, as the only error condition
>>> + * for direct map manipulation is failure to allocate page tables
>>> + * when splitting huge pages, but this split would have already
>>> + * happened in folio_zap_direct_map() in kvm_gmem_folio_zap_direct_map().
Do you know if folio_restore_direct_map() will also end up merging page
table entries to a higher level?
>>> + * Thus folio_restore_direct_map() here only updates prot bits.
>>> + */
>>> + if (kvm_gmem_folio_no_direct_map(folio)) {
>>> + WARN_ON_ONCE(folio_restore_direct_map(folio));
>>> + folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP);
>>> + }
>>> +}
>>> +
>>
>> Does this assume the folio would not have been split after it was zapped? As in,
>> if it was zapped at 2MB granularity (no 4KB direct map split required) but then
>> restored at 4KB (split required)? Or it gets merged somehow before this?
I agree with the rest of the discussion that this will probably land
before huge page support, so I will have to figure out the intersection
of the two later.
>
> AFAIK it can't be zapped at 2MB granularity as the zapping code will
> inevitably cause splitting because guest_memfd faults occur at the base
> page granularity as of now.
Here's what I'm thinking for now:
[HugeTLB, no conversions]
With initial HugeTLB support (no conversions), host userspace
guest_memfd faults will be:
+ For guest_memfd with PUD-sized pages
+ At PUD level or PTE level
+ For guest_memfd with PMD-sized pages
+ At PMD level or PTE level
Since this guest_memfd doesn't support conversions, the folio is never
split/merged, so the direct map is restored at whatever level it was
zapped. I think this works out well.
[HugeTLB + conversions]
For a guest_memfd with HugeTLB support and conversions, host userspace
guest_memfd faults will always be at PTE level, so the direct map will
be split and the faulted pages have the direct map zapped in 4K chunks
as they are faulted.
On conversion back to private, put those back into the direct map
(putting aside whether to merge the direct map PTEs for now).
Unfortunately there's no unmapping callback for guest_memfd to use, so
perhaps the principle should be to put the folios back into the direct
map ASAP - at unmapping if guest_memfd is doing the unmapping, otherwise
at freeing time?
Powered by blists - more mailing lists