linux-kernel - Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJuCfpGU3c102mLZBY6UzkbW-DtfpYF77wLgzFpRWagBw8XfMQ@mail.gmail.com>
Date:   Tue, 17 Jan 2023 10:21:28 -0800
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     vbabka@...e.cz, hannes@...xchg.org, mgorman@...hsingularity.net,
        peterz@...radead.org, hughd@...gle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock

On Tue, Jan 17, 2023 at 12:34 AM Hillf Danton <hdanton@...a.com> wrote:
>
> On Mon, 16 Jan 2023 20:52:45 -0800 Suren Baghdasaryan <surenb@...gle.com>
> > On Mon, Jan 16, 2023 at 7:16 PM Hillf Danton <hdanton@...a.com> wrote:
> > > No you are not.
> >
> > I'm not wrong or the other way around? Please expand a bit.
>
> You are not wrong.

Ok, I think if I rewrite the vma_read_trylock() we should be fine?:

static inline bool vma_read_trylock(struct vm_area_struct *vma)
{
       int count, new;

        /* Check before locking. A race might cause false locked result. */
       if (READ_ONCE(vma->vm_lock->lock_seq) ==
           READ_ONCE(vma->vm_mm->mm_lock_seq))
                return false;

        count = atomic_read(&vma->vm_lock->count);
        for (;;) {
              /*
               * Is VMA is write-locked? Overflow might produce false
locked result.
               * False unlocked result is impossible because we modify and check
               * vma->vm_lock_seq under vma->vm_lock protection and
mm->mm_lock_seq
               * modification invalidates all existing locks.
               */
              if (count < 0)
                        return false;

             new = count + 1;
             /* If atomic_t overflows, fail to lock. */
             if (new < 0)
                        return false;

             /*
              * Atomic RMW will provide implicit mb on success to pair
with smp_wmb in
              * vma_write_lock, on failure we retry.
              */
              new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
                if (new == count)
                        break;
                count = new;
                cpu_relax();
        }
       if (unlikely(READ_ONCE(vma->vm_lock->lock_seq) ==
           READ_ONCE(vma->vm_mm->mm_lock_seq))) {
               if (atomic_dec_and_test(&vma->vm_lock->count))
                       wake_up(&vma->vm_mm->vma_writer_wait);
                return false;
        }
        return true;
}
> > >
> > > If the writer lock owner is preempted by a reader while releasing lock,
> > >
> > >         set count to zero
> > >                           <-- preempt
> > >         wake up waiters
> > >
> > > then lock is owned by reader but with read waiters.
> > >
> > > That is buggy if write waiter starvation is allowed in this patchset.
> >
> > I don't quite understand your point here. Readers don't wait, so there
> > can't be "read waiters". Could you please expand with a race diagram
> > maybe?
>
>         cpu3                    cpu2
>         ---                     ---
>         taskA bond to cpu3
>         down_write(&mm->mmap_lock);
>         vma_write_lock L
>                                 taskB fail to take L for read
>                                 taskC fail to take mmap_lock for write
>                                 taskD fail to take L for read
>         vma_write_unlock_mm(mm);
>
>         preempted by taskE
>            taskE take L for read and
>            read waiters of L, taskB and taskD,
>            should be woken up
>
>         up_write(&mm->mmap_lock);

Readers never wait for vma lock, that's why we have only
vma_read_trylock and no vma_read_lock. In your scenario taskB and
taskD will fall back to taking mmap_lock for read after they failed
vma_read_trylock. Once taskA does up_write(mmap_lock) they will be
woken up since they are blocked on taking mmap_lock for read.

>