linux-kernel - Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHoHcZxQZgt4Ki1kiBu9O+sANZQambOa+1gSQu2brPoyA@mail.gmail.com>
Date:   Mon, 16 Jan 2023 15:08:48 -0800
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     vbabka@...e.cz, hannes@...xchg.org, mgorman@...hsingularity.net,
        peterz@...radead.org, hughd@...gle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock

On Mon, Jan 16, 2023 at 6:07 AM Hillf Danton <hdanton@...a.com> wrote:
>
> On Mon, 9 Jan 2023 12:53:36 -0800 Suren Baghdasaryan <surenb@...gle.com>
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -627,12 +627,16 @@ static inline void vma_write_lock(struct vm_area_struct *vma)
> >        * mm->mm_lock_seq can't be concurrently modified.
> >        */
> >       mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq);
> > -     if (vma->vm_lock_seq == mm_lock_seq)
> > +     if (vma->vm_lock->lock_seq == mm_lock_seq)
> >               return;
>
>         lock acquire for write to info lockdep.

Thanks for the review Hillf!

Good idea. Will add in the next version.

> >
> > -     down_write(&vma->vm_lock->lock);
> > -     vma->vm_lock_seq = mm_lock_seq;
> > -     up_write(&vma->vm_lock->lock);
> > +     if (atomic_cmpxchg(&vma->vm_lock->count, 0, -1))
> > +             wait_event(vma->vm_mm->vma_writer_wait,
> > +                        atomic_cmpxchg(&vma->vm_lock->count, 0, -1) == 0);
> > +     vma->vm_lock->lock_seq = mm_lock_seq;
> > +     /* Write barrier to ensure lock_seq change is visible before count */
> > +     smp_wmb();
> > +     atomic_set(&vma->vm_lock->count, 0);
> >  }
> >
> >  /*
> > @@ -643,20 +647,28 @@ static inline void vma_write_lock(struct vm_area_struct *vma)
> >  static inline bool vma_read_trylock(struct vm_area_struct *vma)
> >  {
> >       /* Check before locking. A race might cause false locked result. */
> > -     if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > +     if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> >               return false;
>
> Add mb to pair with the above wmb like

The wmb above is to ensure the ordering between updates of lock_seq
and vm_lock->count (lock_seq is updated first and vm_lock->count only
after that). The first access to vm_lock->count in this function is
atomic_inc_unless_negative() and it's an atomic RMW operation with a
return value. According to documentation such functions are fully
ordered, therefore I think we already have an implicit full memory
barrier between reads of lock_seq and vm_lock->count here. Am I wrong?

>
>         if (READ_ONCE(vma->vm_lock->lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq)) {
>                 smp_acquire__after_ctrl_dep();
>                 return false;
>         }
> >
> > -     if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0))
> > +     if (unlikely(!atomic_inc_unless_negative(&vma->vm_lock->count)))
> >               return false;
> >
> > +     /* If atomic_t overflows, restore and fail to lock. */
> > +     if (unlikely(atomic_read(&vma->vm_lock->count) < 0)) {
> > +             if (atomic_dec_and_test(&vma->vm_lock->count))
> > +                     wake_up(&vma->vm_mm->vma_writer_wait);
> > +             return false;
> > +     }
> > +
> >       /*
> >        * Overflow might produce false locked result.
> >        * False unlocked result is impossible because we modify and check
> >        * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
> >        * modification invalidates all existing locks.
> >        */
> > -     if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
> > -             up_read(&vma->vm_lock->lock);
> > +     if (unlikely(vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
> > +             if (atomic_dec_and_test(&vma->vm_lock->count))
> > +                     wake_up(&vma->vm_mm->vma_writer_wait);
> >               return false;
> >       }
>
> Simpler way to detect write lock owner and count overflow like
>
>         int count = atomic_read(&vma->vm_lock->count);
>         for (;;) {
>                 int new = count + 1;
>
>                 if (count < 0 || new < 0)
>                         return false;
>
>                 new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
>                 if (new == count)
>                         break;
>                 count = new;
>                 cpu_relax();
>         }
>
>         (wake up waiting readers after taking the lock;
>         but the write lock owner before this read trylock should be
>         responsible for waking waiters up.)
>
>         lock acquire for read.

This schema might cause readers to wait, which is not an exact
replacement for down_read_trylock(). The requirement to wake up
waiting readers also complicates things and since we can always fall
back to mmap_lock, that complication is unnecessary IMHO. I could use
part of your suggestion like this:

                 int new = count + 1;

                 if (count < 0 || new < 0)
                         return false;

                 new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
                 if (new == count)
                         return false;

Compared to doing atomic_inc_unless_negative() first, like I did
originally, this schema opens a bit wider window for the writer to get
in the middle and cause the reader to fail locking but I don't think
it would result in any visible regression.

>
> >       return true;
> > @@ -664,7 +676,8 @@ static inline bool vma_read_trylock(struct vm_area_struct *vma)
> >
> >  static inline void vma_read_unlock(struct vm_area_struct *vma)
> >  {
>         lock release for read.

Ack.

>
> > -     up_read(&vma->vm_lock->lock);
> > +     if (atomic_dec_and_test(&vma->vm_lock->count))
> > +             wake_up(&vma->vm_mm->vma_writer_wait);
> >  }
>