linux-kernel - Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y8/WBFj4uLz4N2ZH@hirez.programming.kicks-ass.net>
Date:   Tue, 24 Jan 2023 13:58:44 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Waiman Long <longman@...hat.com>
Cc:     Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        linux-kernel@...r.kernel.org, john.p.donnelly@...cle.com,
        Hillf Danton <hdanton@...a.com>,
        Mukesh Ojha <quic_mojha@...cinc.com>,
        Ting11 Wang 王婷 <wangting11@...omi.com>
Subject: Re: [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff

On Mon, Jan 23, 2023 at 05:07:08PM -0500, Waiman Long wrote:
> On 1/23/23 12:30, Waiman Long wrote:
> > I will update the patch description to highlight the points that I
> > discussed in this email.
> 
> I am planning to update the patch description to as follows:
> 
>     The lock handoff provided in rwsem isn't a true handoff like that in
>     the mutex. Instead, it is more like a quiescent state where optimistic
>     spinning and lock stealing are disabled to make it easier for the first
>     waiter to acquire the lock.
> 
>     For mutex, lock handoff is done at unlock time as the owner value and
>     the handoff bit is in the same lock word and can be updated atomically.
> 
>     That is the not case for rwsem which has a separate count value for
>     locking and an owner value. The only way to update them in a
> quasi-atomic
>     way is to use the wait_lock for synchronization as the handoff bit can
>     only be updated while holding the wait_lock. So for rwsem, the new
>     lock handoff mechanism is done mostly at rwsem_wake() time when the
>     wait_lock has to be acquired anyway to minimize additional overhead.

So for first==reader, sure, and you don't need anything special, since
rwsem_mark_wake() already does the right thing.

But for first==writer, I don't follow; *WHY* do you have to complicate
this path so. The write_slowpath already takes wait_lock for
rwsem_try_write_lock() and that already knows about handoff.

>     It is also likely that the active lock in this case may be a transient
>     RWSEM_READER_BIAS that will be removed soon. So we have a secondary
>     handoff done at reader slow path to handle this particular case.

Only because you made it so damn complicated. If instead you rely on the
wait_lock in write_slowpath you can keep it all in once place AFAICT.

>     For reader-owned rwsem, the owner value other than the
> RWSEM_READER_OWNED
>     bit is mostly for debugging purpose only. So it is not safe to use
>     the owner value to confirm a handoff to a reader has happened. On the

What ?!? Where do we care about the owner value? There's
RWSEM_FLAG_HANDOFF which lives in sem->count and there's
waiter->handoff_set. Nowhere do we care about sem->owner in this.

>     other hand, we can do that when handing off to a writer. However, it
>     is simpler to use the same mechanism to notify a handoff has happened
>     for both readers and writers. So a new HANDOFF_GRANTED state is added

I really can't follow whatever logic jump here.

>     to enum rwsem_handoff_state to signify that. This new value will be
>     written to the handoff_state value of the first waiter.
> 
>     With true lock handoff, there is no need to do a NULL owner spinning
>     anymore as wakeup will be performed if handoff is successful. So it
>     is likely that the first waiter won't actually go to sleep even when
>     schedule() is called in this case.

So this spinning, this is purely for writer->write handoff (which is
exceedingly rare since it is readers that set handoff), right?

Why is that so important?

Also, why can't we add something like

	owner = rwsem_owner_flags(sem, &flags);
	if (owner && !(flags & RWSEM_READER_OWNED))
		atomic_long_cond_read_relaxed(&sem->counter, !(VAL & RWSEM_WRITER_LOCKED))

to the start of that? If it's really needed.