[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <accf2b4b-2a54-4261-b67e-010cb74082ae@intel.com>
Date: Wed, 2 Oct 2024 16:11:27 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Anthony Yznaga <anthony.yznaga@...cle.com>, akpm@...ux-foundation.org,
willy@...radead.org, markhemm@...glemail.com, viro@...iv.linux.org.uk,
david@...hat.com, khalid@...nel.org
Cc: andreyknvl@...il.com, luto@...nel.org, brauner@...nel.org, arnd@...db.de,
ebiederm@...ssion.com, catalin.marinas@....com, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhiramat@...nel.org,
rostedt@...dmis.org, vasily.averin@...ux.dev, xhao@...ux.alibaba.com,
pcc@...gle.com, neilb@...e.de, maz@...nel.org,
David Rientjes <rientjes@...gle.com>
Subject: Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes
About TLB flushing...
The quick and dirty thing to do is just flush_tlb_all() after you remove
the PTE from the host mm. That will surely work everywhere and it's as
dirt simple as you get. Honestly, it might even be cheaper than the
alternative.
Also, I don't think PCIDs actually complicate the problem at all. We
basically do remote mm TLB flushes using two mechanisms:
1. If the mm is loaded, use INVLPG and friends to zap the TLB
2. Bump mm->context.tlb_gen so that the next time it _gets_
loaded, the TLB is flushed.
flush_tlb_func() really only cares about #1 since if the mm isn't
loaded, it'll get flushed anyway at the next context switch.
The alternatives I can think of:
Make flush_tlb_mm_range(host_mm) work somehow. You'd need to somehow
keep mm_cpumask(host_mm) up to date and also make do something to
flush_tlb_func() to tell it that 'loaded_mm' isn't relevant and it
should flush regardless.
The other way is to use the msharefs's inode ->i_mmap to find all the
VMAs mapping the file, and find all *their* mm's:
for each vma in inode->i_mmap
mm = vma->vm_mm
flush_tlb_mm_range(<vma range here>)
But that might be even worse than flush_tlb_all() because it might end
up sending more than one IPI per CPU.
You can fix _that_ by keeping a single cpumask that you build up:
mask = 0
for each vma in inode->i_mmap
mm = vma->vm_mm
mask |= mm_cpumask(mm)
flush_tlb_multi(mask, info);
Unfortunately, 'info->mm' needs to be more than one mm, so you probably
still need a new flush_tlb_func() flush type to tell it to ignore
'info->mm' and flush anyway.
After all that, I kinda like flush_tlb_all(). ;)
Powered by blists - more mailing lists