[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFq23m-KYKaDoCS2K2aM8rO7j8aPa0nVPs-_xP2Sf6GGg@mail.gmail.com>
Date: Mon, 16 Jan 2023 20:52:45 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Hillf Danton <hdanton@...a.com>
Cc: vbabka@...e.cz, hannes@...xchg.org, mgorman@...hsingularity.net,
peterz@...radead.org, hughd@...gle.com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock
On Mon, Jan 16, 2023 at 7:16 PM Hillf Danton <hdanton@...a.com> wrote:
>
> On Mon, 16 Jan 2023 15:08:48 -0800 Suren Baghdasaryan <surenb@...gle.com>
> > On Mon, Jan 16, 2023 at 6:07 AM Hillf Danton <hdanton@...a.com> wrote:
> > > On Mon, 9 Jan 2023 12:53:36 -0800 Suren Baghdasaryan <surenb@...gle.com>
> > > > static inline bool vma_read_trylock(struct vm_area_struct *vma)
> > > > {
> > > > /* Check before locking. A race might cause false locked result. */
> > > > - if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > > + if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > > return false;
> > >
> > > Add mb to pair with the above wmb like
> >
> > The wmb above is to ensure the ordering between updates of lock_seq
> > and vm_lock->count (lock_seq is updated first and vm_lock->count only
> > after that). The first access to vm_lock->count in this function is
> > atomic_inc_unless_negative() and it's an atomic RMW operation with a
> > return value. According to documentation such functions are fully
> > ordered, therefore I think we already have an implicit full memory
> > barrier between reads of lock_seq and vm_lock->count here. Am I wrong?
>
> No you are not.
I'm not wrong or the other way around? Please expand a bit.
> Revisit it in case of full mb not ensured.
>
> #ifndef arch_atomic_inc_unless_negative
> static __always_inline bool
> arch_atomic_inc_unless_negative(atomic_t *v)
> {
> int c = arch_atomic_read(v);
>
> do {
> if (unlikely(c < 0))
> return false;
> } while (!arch_atomic_try_cmpxchg(v, &c, c + 1));
>
> return true;
> }
> #define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
> #endif
I think your point is that the full mb is not ensured here, specifically if c<0?
>
> > > int count = atomic_read(&vma->vm_lock->count);
> > > for (;;) {
> > > int new = count + 1;
> > >
> > > if (count < 0 || new < 0)
> > > return false;
> > >
> > > new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> > > if (new == count)
> > > break;
> > > count = new;
> > > cpu_relax();
> > > }
> > >
> > > (wake up waiting readers after taking the lock;
> > > but the write lock owner before this read trylock should be
> > > responsible for waking waiters up.)
> > >
> > > lock acquire for read.
> >
> > This schema might cause readers to wait, which is not an exact
>
> Could you specify a bit more on wait?
Oh, I misunderstood your intent with that for() loop. Indeed, if a
writer took the lock, the count will be negative and it terminates
with a failure. Yeah, I think that would work.
>
> > replacement for down_read_trylock(). The requirement to wake up
> > waiting readers also complicates things
>
> If the writer lock owner is preempted by a reader while releasing lock,
>
> set count to zero
> <-- preempt
> wake up waiters
>
> then lock is owned by reader but with read waiters.
>
> That is buggy if write waiter starvation is allowed in this patchset.
I don't quite understand your point here. Readers don't wait, so there
can't be "read waiters". Could you please expand with a race diagram
maybe?
>
> > and since we can always fall
> > back to mmap_lock, that complication is unnecessary IMHO. I could use
> > part of your suggestion like this:
> >
> > int new = count + 1;
> >
> > if (count < 0 || new < 0)
> > return false;
> >
> > new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> > if (new == count)
> > return false;
> >
> > Compared to doing atomic_inc_unless_negative() first, like I did
> > originally, this schema opens a bit wider window for the writer to get
> > in the middle and cause the reader to fail locking but I don't think
> > it would result in any visible regression.
>
> It is definitely fine for writer to acquire the lock while reader is
> doing trylock, no?
Yes.
>
Powered by blists - more mailing lists