[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a185df19-c8a5-4b2f-8bed-19770744a944@oracle.com>
Date: Tue, 15 Oct 2024 17:59:42 -0700
From: Anthony Yznaga <anthony.yznaga@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, markhemm@...glemail.com,
viro@...iv.linux.org.uk, david@...hat.com, khalid@...nel.org,
andreyknvl@...il.com, dave.hansen@...el.com, luto@...nel.org,
brauner@...nel.org, arnd@...db.de, ebiederm@...ssion.com,
catalin.marinas@....com, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhiramat@...nel.org,
rostedt@...dmis.org, vasily.averin@...ux.dev, xhao@...ux.alibaba.com,
pcc@...gle.com, neilb@...e.de, maz@...nel.org
Subject: Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes
On 10/14/24 1:07 PM, Jann Horn wrote:
> On Wed, Sep 4, 2024 at 1:22 AM Anthony Yznaga <anthony.yznaga@...cle.com> wrote:
>> One major issue to address for this series to function correctly
>> is how to ensure proper TLB flushing when a page in a shared
>> region is unmapped. For example, since the rmaps for pages in a
>> shared region map back to host vmas which point to a host mm, TLB
>> flushes won't be directed to the CPUs the sharing processes have
>> run on. I am by no means an expert in this area. One idea is to
>> install a mmu_notifier on the host mm that can gather the necessary
>> data and do flushes similar to the batch flushing.
> The mmu_notifier API has two ways you can use it:
>
> First, there is the classic mode, where before you start modifying
> PTEs in some range, you remove mirrored PTEs from some other context,
> and until you're done with your PTE modification, you don't allow
> creation of new mirrored PTEs. This is intended for cases where
> individual PTE entries are copied over to some other context (such as
> EPT tables for virtualization). When I last looked at that code, it
> looked fine, and this is what KVM uses. But it probably doesn't match
> your usecase, since you wouldn't want removal of a single page to
> cause the entire page table containing it to be temporarily unmapped
> from the processes that use it?
No, definitely don't want to do that. :-)
>
> Second, there is a newer mode for IOMMUv2 stuff (using the
> mmu_notifier_ops::invalidate_range callback), where the idea is that
> you have secondary MMUs that share the normal page tables, and so you
> basically send them invalidations at the same time you invalidate the
> primary MMU for the process. I think that's the right fit for this
> usecase; however, last I looked, this code was extremely broken (see
> https://lore.kernel.org/lkml/CAG48ez2NQKVbv=yG_fq_jtZjf8Q=+Wy54FxcFrK_OujFg5BwSQ@mail.gmail.com/
> for context). Unless that's changed in the meantime, I think someone
> would have to fix that code before it can be relied on for new
> usecases.
Thank you for this background! Looks like there have since been some
changes to the mmu notifiers, and the invalidate_range callback became
arch_invalidate_secondary_tlbs. I'm currently looking into using it to
flush all TLBs.
Anthony
Powered by blists - more mailing lists