linux-kernel - Re: [PATCH 12/41] mm: add per-VMA lock and helper functions to control it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG48ez3sCwasFzKD5CsqMFA2W57-2fazd75g7r0NaA_BVNTLow@mail.gmail.com>
Date:   Tue, 17 Jan 2023 22:45:25 +0100
From:   Jann Horn <jannh@...gle.com>
To:     Suren Baghdasaryan <surenb@...gle.com>
Cc:     peterz@...radead.org, Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will@...nel.org>, akpm@...ux-foundation.org,
        michel@...pinasse.org, jglisse@...gle.com, mhocko@...e.com,
        vbabka@...e.cz, hannes@...xchg.org, mgorman@...hsingularity.net,
        dave@...olabs.net, willy@...radead.org, liam.howlett@...cle.com,
        ldufour@...ux.ibm.com, laurent.dufour@...ibm.com,
        paulmck@...nel.org, luto@...nel.org, songliubraving@...com,
        peterx@...hat.com, david@...hat.com, dhowells@...hat.com,
        hughd@...gle.com, bigeasy@...utronix.de, kent.overstreet@...ux.dev,
        punit.agrawal@...edance.com, lstoakes@...il.com,
        peterjung1337@...il.com, rientjes@...gle.com,
        axelrasmussen@...gle.com, joelaf@...gle.com, minchan@...gle.com,
        shakeelb@...gle.com, tatashin@...gle.com, edumazet@...gle.com,
        gthelen@...gle.com, gurua@...gle.com, arjunroy@...gle.com,
        soheil@...gle.com, hughlynch@...gle.com, leewalsh@...gle.com,
        posk@...gle.com, linux-mm@...ck.org,
        linux-arm-kernel@...ts.infradead.org,
        linuxppc-dev@...ts.ozlabs.org, x86@...nel.org,
        linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Re: [PATCH 12/41] mm: add per-VMA lock and helper functions to
 control it

On Tue, Jan 17, 2023 at 10:28 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> On Tue, Jan 17, 2023 at 10:03 AM Jann Horn <jannh@...gle.com> wrote:
> >
> > +locking maintainers
>
> Thanks! I'll CC the locking maintainers in the next posting.
>
> >
> > On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> > > Introduce a per-VMA rw_semaphore to be used during page fault handling
> > > instead of mmap_lock. Because there are cases when multiple VMAs need
> > > to be exclusively locked during VMA tree modifications, instead of the
> > > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock
> > > exclusively and setting vma->lock_seq to the current mm->lock_seq. When
> > > mmap_write_lock holder is done with all modifications and drops mmap_lock,
> > > it will increment mm->lock_seq, effectively unlocking all VMAs marked as
> > > locked.
> > [...]
> > > +static inline void vma_read_unlock(struct vm_area_struct *vma)
> > > +{
> > > +       up_read(&vma->lock);
> > > +}
> >
> > One thing that might be gnarly here is that I think you might not be
> > allowed to use up_read() to fully release ownership of an object -
> > from what I remember, I think that up_read() (unlike something like
> > spin_unlock()) can access the lock object after it's already been
> > acquired by someone else. So if you want to protect against concurrent
> > deletion, this might have to be something like:
> >
> > rcu_read_lock(); /* keeps vma alive */
> > up_read(&vma->lock);
> > rcu_read_unlock();
>
> But for deleting VMA one would need to write-lock the vma->lock first,
> which I assume can't happen until this up_read() is complete. Is that
> assumption wrong?

__up_read() does:

rwsem_clear_reader_owned(sem);
tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count);
DEBUG_RWSEMS_WARN_ON(tmp < 0, sem);
if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) ==
      RWSEM_FLAG_WAITERS)) {
  clear_nonspinnable(sem);
  rwsem_wake(sem);
}

The atomic_long_add_return_release() is the point where we are doing
the main lock-releasing.

So if a reader dropped the read-lock while someone else was waiting on
the lock (RWSEM_FLAG_WAITERS) and no other readers were holding the
lock together with it, the reader also does clear_nonspinnable() and
rwsem_wake() afterwards.
But in rwsem_down_write_slowpath(), after we've set
RWSEM_FLAG_WAITERS, we can return successfully immediately once
rwsem_try_write_lock() sees that there are no active readers or
writers anymore (if RWSEM_LOCK_MASK is unset and the cmpxchg
succeeds). We're not necessarily waiting for the "nonspinnable" bit or
the wake.

So yeah, I think down_write() can return successfully before up_read()
is done with its memory accesses.

(Spinlocks are different - the kernel relies on being able to drop
references via spin_unlock() in some places.)