[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DDVS9ITBCE2Z.RSTLCU79EX8G@google.com>
Date: Thu, 30 Oct 2025 16:05:05 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>, "Roy, Patrick" <roypat@...zon.co.uk>
Cc: "pbonzini@...hat.com" <pbonzini@...hat.com>, "corbet@....net" <corbet@....net>, 
	"maz@...nel.org" <maz@...nel.org>, "oliver.upton@...ux.dev" <oliver.upton@...ux.dev>, 
	"joey.gouly@....com" <joey.gouly@....com>, "suzuki.poulose@....com" <suzuki.poulose@....com>, 
	"yuzenghui@...wei.com" <yuzenghui@...wei.com>, "catalin.marinas@....com" <catalin.marinas@....com>, 
	"will@...nel.org" <will@...nel.org>, "tglx@...utronix.de" <tglx@...utronix.de>, 
	"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>, 
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "x86@...nel.org" <x86@...nel.org>, 
	"hpa@...or.com" <hpa@...or.com>, "luto@...nel.org" <luto@...nel.org>, 
	"peterz@...radead.org" <peterz@...radead.org>, "willy@...radead.org" <willy@...radead.org>, 
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "david@...hat.com" <david@...hat.com>, 
	"lorenzo.stoakes@...cle.com" <lorenzo.stoakes@...cle.com>, 
	"Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>, "vbabka@...e.cz" <vbabka@...e.cz>, 
	"rppt@...nel.org" <rppt@...nel.org>, "surenb@...gle.com" <surenb@...gle.com>, "mhocko@...e.com" <mhocko@...e.com>, 
	"song@...nel.org" <song@...nel.org>, "jolsa@...nel.org" <jolsa@...nel.org>, "ast@...nel.org" <ast@...nel.org>, 
	"daniel@...earbox.net" <daniel@...earbox.net>, "andrii@...nel.org" <andrii@...nel.org>, 
	"martin.lau@...ux.dev" <martin.lau@...ux.dev>, "eddyz87@...il.com" <eddyz87@...il.com>, 
	"yonghong.song@...ux.dev" <yonghong.song@...ux.dev>, 
	"john.fastabend@...il.com" <john.fastabend@...il.com>, "kpsingh@...nel.org" <kpsingh@...nel.org>, 
	"sdf@...ichev.me" <sdf@...ichev.me>, "haoluo@...gle.com" <haoluo@...gle.com>, "jgg@...pe.ca" <jgg@...pe.ca>, 
	"jhubbard@...dia.com" <jhubbard@...dia.com>, "peterx@...hat.com" <peterx@...hat.com>, 
	"jannh@...gle.com" <jannh@...gle.com>, "pfalcato@...e.de" <pfalcato@...e.de>, 
	"shuah@...nel.org" <shuah@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, 
	"kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>, 
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>, 
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>, 
	"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>, "Cali, Marco" <xmarcalx@...zon.co.uk>, 
	"Kalyazin, Nikita" <kalyazin@...zon.co.uk>, "Thomson, Jack" <jackabt@...zon.co.uk>, 
	"derekmn@...zon.co.uk" <derekmn@...zon.co.uk>, "tabba@...gle.com" <tabba@...gle.com>, 
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>
Subject: Re: [PATCH v7 06/12] KVM: guest_memfd: add module param for disabling
 TLB flushing
On Thu Sep 25, 2025 at 6:27 PM UTC, Dave Hansen wrote:
> On 9/24/25 08:22, Roy, Patrick wrote:
>> Add an option to not perform TLB flushes after direct map manipulations.
>
> I'd really prefer this be left out for now. It's a massive can of worms.
> Let's agree on something that works and has well-defined behavior before
> we go breaking it on purpose.
As David pointed out in the MM Alignment Session yesterday, I might be
able to help here. In [0] I've proposed a way to break up the direct map
by ASI's "sensitivity" concept, which is weaker than the "totally absent
from the direct map" being proposed here, but it has kinda similar
implementation challenges.
Basically it introduces a thing called a "freetype" that extends the
idea of migratetype. Like the existing idea of migratetype, it's used to
physically group pages when allocating, and you can index free pages by
it, i.e. each freetype gets its own freelist. But it can also encode
other information than mobility (and the other stuff that's encoded in
migratetype...).
Could it make sense to use that logic to just have entire pageblocks
that are absent from the direct map? Then when allocating memory for the
guest_memfd we get it from one of those pageblocks. Then we only have to
flush the TLB if there's no memory left in pageblocks of this freetype
(so the allocator has to flip another pageblock over to the "no direct
map" freetype, after removing it from the direct map).
I haven't yet investigated this properly, I'll start doing that now.
But I thought I'd immediately drop this note in case anyone can
immediately see a reason why this doesn't work.
[0] https://lore.kernel.org/all/20250924-b4-asi-page-alloc-v1-0-2d861768041f@google.com/T/#t
BTW, I think if the skip-flush flag is the only thing blocking this
patchset, it would be great to merge it without it. Even if that means
it's no use for Firecracker usecases that doesn't mean the underlying
feature isn't valuable for _someone_. Then we can figure out how to make
it work for Firecracker afterwards, one way or another.
(Just to be transparent: my nefarious ulterior motive is that it would
give me an angle to start merging code that will eventually support ASI.
But, I'm serious that there are probably users who would like this
feature even if it's slow!)
Powered by blists - more mailing lists
 
