[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jes252u5qfhla2bdmg6pdkfpi4a2jfhf7d5b6ra6ol2bmt352x@gunhzaca56df>
Date: Wed, 18 Dec 2024 14:38:32 -0500
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, akpm@...ux-foundation.org,
willy@...radead.org, lorenzo.stoakes@...cle.com, mhocko@...e.com,
vbabka@...e.cz, hannes@...xchg.org, mjguzik@...il.com,
oliver.sang@...el.com, mgorman@...hsingularity.net, david@...hat.com,
peterx@...hat.com, oleg@...hat.com, dave@...olabs.net,
paulmck@...nel.org, brauner@...nel.org, dhowells@...hat.com,
hdanton@...a.com, hughd@...gle.com, lokeshgidra@...gle.com,
minchan@...gle.com, jannh@...gle.com, shakeel.butt@...ux.dev,
souravpanda@...gle.com, pasha.tatashin@...een.com,
klarasmodin@...il.com, corbet@....net, linux-doc@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...roid.com
Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a
reference count
* Suren Baghdasaryan <surenb@...gle.com> [241218 14:29]:
> On Wed, Dec 18, 2024 at 11:07 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> >
> > On Wed, Dec 18, 2024 at 11:00 AM 'Liam R. Howlett' via kernel-team
> > <kernel-team@...roid.com> wrote:
> > >
> > > * Suren Baghdasaryan <surenb@...gle.com> [241218 12:58]:
> > > > On Wed, Dec 18, 2024 at 9:44 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > > > >
> > > > > On Wed, Dec 18, 2024 at 09:36:42AM -0800, Suren Baghdasaryan wrote:
> > > > >
> > > > > > > You will not. vms_complete_munmap_vmas() will call remove_vma() to
> > > > > > > remove PTEs IIRC, and if you do start_write() and detach() before
> > > > > > > dropping mmap_lock_write, you should be good.
> > > > > >
> > > > > > Ok, I think we will have to move mmap_write_downgrade() inside
> > > > > > vms_complete_munmap_vmas() to be called after remove_vma().
> > > > > > vms_clear_ptes() is using vmas, so we can't move remove_vma() before
> > > > > > mmap_write_downgrade().
> > > > >
> > > > > Why ?!
> > > > >
> > > > > vms_clear_ptes() and remove_vma() are fine where they are -- there is no
> > > > > concurrency left at this point.
> > > > >
> > > > > Note that by doing vma_start_write() inside vms_complete_munmap_vmas(),
> > > > > which is *after* the vmas have been unhooked from the mm, you wait for
> > > > > any concurrent user to go away.
> > > > >
> > > > > And since they're unhooked, there can't be any new users.
> > > > >
> > > > > So you're the one and only user left, and code is fine the way it is.
> > > >
> > > > Ok, let me make sure I understand this part of your proposal. From
> > > > your earlier email:
> > > >
> > > > @@ -1173,6 +1173,11 @@ static void vms_complete_munmap_vmas(struct
> > > > vma_munmap_struct *vms,
> > > > struct vm_area_struct *vma;
> > > > struct mm_struct *mm;
> > > >
> > > > + mas_for_each(mas_detach, vma, ULONG_MAX) {
> > > > + vma_start_write(next);
> > > > + vma_mark_detached(next, true);
> > > > + }
> > > > +
> > > > mm = current->mm;
> > > > mm->map_count -= vms->vma_count;
> > > > mm->locked_vm -= vms->locked_vm;
> > > >
> > > > This would mean:
> > > >
> > > > vms_complete_munmap_vmas
> > > > vma_start_write
> > > > vma_mark_detached
> > > > mmap_write_downgrade
> > > > vms_clear_ptes
> > > > remove_vma
> > > >
> > > > And remove_vma will be just freeing the vmas. Is that correct?
> > > > I'm a bit confused because the original thinking was that
> > > > vma_mark_detached() would drop the last refcnt and if it's 0 we would
> > > > free the vma right there. If that's still what we want to do then I
> > > > think the above sequence should look like this:
> > > >
> > > > vms_complete_munmap_vmas
> > > > vms_clear_ptes
> > > > remove_vma
> > > > vma_start_write
> > > > vma_mark_detached
> > > > mmap_write_downgrade
> > > >
> > > > because vma_start_write+vma_mark_detached should be done under mmap_write_lock.
> > > > Please let me know which way you want to move forward.
> > > >
> > >
> > > Are we sure we're not causing issues with the MAP_FIXED path here?
> > >
> > > With the above change, we'd be freeing the PTEs before marking the vmas
> > > as detached or vma_start_write().
> >
> > IIUC when we call vms_complete_munmap_vmas() all vmas inside
> > mas_detach have been already write-locked, no?
That's the way it is today - but I thought you were moving the lock to
the complete stage, not adding a new one? (why add a new one otherwise?)
>
> Yeah, I think we can simply do this:
>
> vms_complete_munmap_vmas
> vms_clear_ptes
> remove_vma
> vma_mark_detached
> mmap_write_downgrade
>
> If my assumption is incorrect, assertion inside vma_mark_detached()
> should trigger. I tried a quick test and so far nothing exploded.
>
If they are write locked, then the page faults are not a concern. There
is also the rmap race that Jann found in mmap_region() [1]. This is
probably also fine since you are keeping the write lock in place earlier
on in the gather stage. Note the ptes will already be cleared by the
time vms_complete_munmap_vmas() is called in this case.
[1] https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
Thanks,
Liam
Powered by blists - more mailing lists