[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <wfzu6jvbazlxdybsjc54aoivleif6memaxaacd56bcep24nkv3@s3e3aj253hd6>
Date: Thu, 19 Dec 2024 12:16:45 -0500
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Suren Baghdasaryan <surenb@...gle.com>, akpm@...ux-foundation.org,
willy@...radead.org, lorenzo.stoakes@...cle.com, mhocko@...e.com,
vbabka@...e.cz, hannes@...xchg.org, mjguzik@...il.com,
oliver.sang@...el.com, mgorman@...hsingularity.net, david@...hat.com,
peterx@...hat.com, oleg@...hat.com, dave@...olabs.net,
paulmck@...nel.org, brauner@...nel.org, dhowells@...hat.com,
hdanton@...a.com, hughd@...gle.com, lokeshgidra@...gle.com,
minchan@...gle.com, jannh@...gle.com, shakeel.butt@...ux.dev,
souravpanda@...gle.com, pasha.tatashin@...een.com,
klarasmodin@...il.com, corbet@....net, linux-doc@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...roid.com
Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a
reference count
* Peter Zijlstra <peterz@...radead.org> [241219 06:20]:
> On Thu, Dec 19, 2024 at 10:13:34AM +0100, Peter Zijlstra wrote:
> > On Wed, Dec 18, 2024 at 01:53:17PM -0800, Suren Baghdasaryan wrote:
> >
> > > Ah, ok I see now. I completely misunderstood what for_each_vma_range()
> > > was doing.
> > >
> > > Then I think vma_start_write() should remain inside
> > > vms_gather_munmap_vmas() and all vmas in mas_detach should be
> >
> > No, it must not. You really are not modifying anything yet (except the
> > split, which we've already noted mark write themselves).
> >
> > > write-locked, even the ones we are not modifying. Otherwise what would
> > > prevent the race I mentioned before?
> > >
> > > __mmap_region
> > > __mmap_prepare
> > > vms_gather_munmap_vmas // adds vmas to be unmapped into mas_detach,
> > > // some locked
> > > by __split_vma(), some not locked
> > >
> > > lock_vma_under_rcu()
> > > vma = mas_walk // finds
> > > unlocked vma also in mas_detach
> > > vma_start_read(vma) //
> > > succeeds since vma is not locked
> > > // vma->detached, vm_start,
> > > vm_end checks pass
> > > // vma is successfully read-locked
> > >
> > > vms_clean_up_area(mas_detach)
> > > vms_clear_ptes
> > > // steps on a cleared PTE
> >
> > So here we have the added complexity that the vma is not unhooked at
> > all.
Well, hold on - it is taken out of the rmap/anon vma chain here. It is
completely unhooked except the vma tree at this point. We're not adding
complexity, we're dealing with it.
>Is there anything that would prevent a concurrent gup_fast() from
> > doing the same -- touch a cleared PTE?
Where does gup_fast() install PTEs? Doesn't it bail once a READ_ONCE()
on any level returns no PTE?
> >
> > AFAICT two threads, one doing overlapping mmap() and the other doing
> > gup_fast() can result in exactly this scenario.
The mmap() call will race with the gup_fast(), but either the nr_pinned
will be returned from gup_fast() before vms_clean_up_area() removes the
page table (or any higher level), or gup_fast() will find nothing.
> >
> > If we don't care about the GUP case, when I'm thinking we should not
> > care about the lockless RCU case either.
>
> Also, at this point we'll just fail to find a page, and that is nothing
> special. The problem with accessing an unmapped VMA is that the
> page-table walk will instantiate page-tables.
I think there is a problem if we are reinstalling page tables on a vma
that's about to be removed. I think we are avoiding this with our
locking though?
>
> Given this is an overlapping mmap -- we're going to need to those
> page-tables anyway, so no harm done.
Well, maybe? The mapping may now be an anon vma vs a file backed, or
maybe it's PROT_NONE?
> Only after the VMA is unlinked must we ensure we don't accidentally
> re-instantiate page-tables.
It's not as simple as that, unfortunately. There are vma callbacks for
drivers (or hugetlbfs, or whatever) that do other things. So we need to
clean up the area before we are able to replace the vma and part of that
clean up is the page tables, or anon vma chain, and/or closing a file.
There are other ways of finding the vma as well, besides the vma tree.
We are following the locking so that we are safe from those perspectives
as well, and so the vma has to be unlinked in a few places in a certain
order.
Thanks,
Liam
Powered by blists - more mailing lists