[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c392f07-1363-4306-a1cc-ac89decdd7bc@redhat.com>
Date: Mon, 7 Oct 2024 10:48:04 +0200
From: David Hildenbrand <david@...hat.com>
To: Dave Hansen <dave.hansen@...el.com>,
Anthony Yznaga <anthony.yznaga@...cle.com>, akpm@...ux-foundation.org,
willy@...radead.org, markhemm@...glemail.com, viro@...iv.linux.org.uk,
khalid@...nel.org
Cc: andreyknvl@...il.com, luto@...nel.org, brauner@...nel.org, arnd@...db.de,
ebiederm@...ssion.com, catalin.marinas@....com, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhiramat@...nel.org,
rostedt@...dmis.org, vasily.averin@...ux.dev, xhao@...ux.alibaba.com,
pcc@...gle.com, neilb@...e.de, maz@...nel.org,
David Rientjes <rientjes@...gle.com>
Subject: Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes
On 02.10.24 19:35, Dave Hansen wrote:
> We were just chatting about this on David Rientjes's MM alignment call.
> I thought I'd try to give a little brain
>
> Let's start by thinking about KVM and secondary MMUs. KVM has a primary
> mm: the QEMU (or whatever) process mm. The virtualization (EPT/NPT)
> tables get entries that effectively mirror the primary mm page tables
> and constitute a secondary MMU. If the primary page tables change,
> mmu_notifiers ensure that the changes get reflected into the
> virtualization tables and also that the virtualization paging structure
> caches are flushed.
>
> msharefs is doing something very similar. But, in the msharefs case,
> the secondary MMUs are actually normal CPU MMUs. The page tables are
> normal old page tables and the caches are the normal old TLB. That's
> what makes it so confusing: we have lots of infrastructure for dealing
> with that "stuff" (CPU page tables and TLB), but msharefs has
> short-circuited the infrastructure and it doesn't work any more.
>
> Basically, I think it makes a lot of sense to check what KVM (or another
> mmu_notifier user) is doing and make sure that msharefs is following its
> lead. For instance, KVM _should_ have the exact same "page free"
> flushing issue where it gets the MMU notifier call but the page may
> still be in the secondary MMU. I _think_ KVM fixes it with an extra
> page refcount that it takes when it first walks the primary page tables.
Forgot to comment on this (brain still recovering ...).
KVM only grabs a temporary reference, and drops that reference once the
secondary MMU PTE was updated (present PTE installed). The notifiers on
primary MMU changes (e.g., unmap) take care of any TLB invalidation
before the primary MMU let's go of the page and the refcount is dropped.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists