[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpETt1NHOhOzkP+pgUnNLNRq3LRRyadsc20pW+cCDLuyPg@mail.gmail.com>
Date: Wed, 18 Jan 2023 09:40:37 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Jann Horn <jannh@...gle.com>
Cc: akpm@...ux-foundation.org, michel@...pinasse.org,
jglisse@...gle.com, mhocko@...e.com, vbabka@...e.cz,
hannes@...xchg.org, mgorman@...hsingularity.net, dave@...olabs.net,
willy@...radead.org, liam.howlett@...cle.com, peterz@...radead.org,
ldufour@...ux.ibm.com, laurent.dufour@...ibm.com,
paulmck@...nel.org, luto@...nel.org, songliubraving@...com,
peterx@...hat.com, david@...hat.com, dhowells@...hat.com,
hughd@...gle.com, bigeasy@...utronix.de, kent.overstreet@...ux.dev,
punit.agrawal@...edance.com, lstoakes@...il.com,
peterjung1337@...il.com, rientjes@...gle.com,
axelrasmussen@...gle.com, joelaf@...gle.com, minchan@...gle.com,
shakeelb@...gle.com, tatashin@...gle.com, edumazet@...gle.com,
gthelen@...gle.com, gurua@...gle.com, arjunroy@...gle.com,
soheil@...gle.com, hughlynch@...gle.com, leewalsh@...gle.com,
posk@...gle.com, linux-mm@...ck.org,
linux-arm-kernel@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, x86@...nel.org,
linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH 27/41] mm/mmap: prevent pagefault handler from racing with
mmu_notifier registration
On Wed, Jan 18, 2023 at 4:51 AM Jann Horn <jannh@...gle.com> wrote:
>
> On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > Page fault handlers might need to fire MMU notifications while a new
> > notifier is being registered. Modify mm_take_all_locks to write-lock all
> > VMAs and prevent this race with fault handlers that would hold VMA locks.
> > VMAs are locked before i_mmap_rwsem and anon_vma to keep the same
> > locking order as in page fault handlers.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> > ---
> > mm/mmap.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 30c7d1c5206e..a256deca0bc0 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3566,6 +3566,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
> > * of mm/rmap.c:
> > * - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
> > * hugetlb mapping);
> > + * - all vmas marked locked
>
> The existing comment above says that this is an *ordered* listing of
> which locks are taken.
>
> > * - all i_mmap_rwsem locks;
> > * - all anon_vma->rwseml
> > *
> > @@ -3591,6 +3592,7 @@ int mm_take_all_locks(struct mm_struct *mm)
> > mas_for_each(&mas, vma, ULONG_MAX) {
> > if (signal_pending(current))
> > goto out_unlock;
> > + vma_write_lock(vma);
> > if (vma->vm_file && vma->vm_file->f_mapping &&
> > is_vm_hugetlb_page(vma))
> > vm_lock_mapping(mm, vma->vm_file->f_mapping);
>
> Note that multiple VMAs can have the same ->f_mapping, so with this,
> the lock ordering between VMA locks and the mapping locks of hugetlb
> VMAs is mixed: If you have two adjacent hugetlb VMAs with the same
> ->f_mapping, then the following operations happen:
>
> 1. lock VMA 1
> 2. lock mapping of VMAs 1 and 2
> 3. lock VMA 2
> 4. [second vm_lock_mapping() is a no-op]
>
> So for VMA 1, we ended up taking the VMA lock first, but for VMA 2, we
> took the mapping lock first.
>
> The existing code has one loop per lock type to ensure that the locks
> really are taken in the specified order, even when some of the locks
> are associated with multiple VMAs.
>
> If we don't care about the ordering between these two, maybe that's
> fine and you just have to adjust the comment; but it would be clearer
> to add a separate loop for the VMA locks.
Oh, thanks for pointing out this detail. A separate loop is definitely
needed here. Will do that in the next respin.
>
> > @@ -3677,6 +3679,7 @@ void mm_drop_all_locks(struct mm_struct *mm)
> > if (vma->vm_file && vma->vm_file->f_mapping)
> > vm_unlock_mapping(vma->vm_file->f_mapping);
> > }
> > + vma_write_unlock_mm(mm);
> >
> > mutex_unlock(&mm_all_locks_mutex);
> > }
> > --
> > 2.39.0
> >
Powered by blists - more mailing lists