[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bafdd02e-68f8-a55c-68fe-8fe165cfe9d4@redhat.com>
Date: Wed, 22 Sep 2021 21:45:08 -0400
From: Waiman Long <llong@...hat.com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Will Deacon <will.deacon@....com>,
linux-kernel@...r.kernel.org, Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [RFC PATCH] locking/rwsem: Add upgrade_read()
On 9/22/21 9:16 PM, Boqun Feng wrote:
> On Wed, Sep 22, 2021 at 03:36:57PM -0400, Waiman Long wrote:
>> Currently there are about 12 instances in the kernel where an up_read()
>> is immediately followed by a down_write() of the same lock. For example,
>>
>> drivers/tty/n_tty.c: up_read(&tty->termios_rwsem);
>> drivers/tty/n_tty.c- down_write(&tty->termios_rwsem);
>>
>> Since we have already provided a downgrade_write() function, we may as
>> well provide an upgrade_read() function to make the code easier to read
>> and the intention clearer.
>>
>> If the current task is the only reader, the upgrade can be done by a
>> single atomic operation. If not, the upgrade will have to be done by a
>> separate up_read() call followed by a down_write(). In the former case,
>> the handoff bit is not considered and the waiter will have to wait a
>> bit longer to acquire the lock.
>>
>> The new upgrade_read() function returns a value of 0 for safe upgrade
>> where rwsem protected data won't change. Otherwise a value of 1 is
>> returned to indicate unsafe upgrade where rwsem protected data may
>> change during the upgrade process.
>>
>> For PREEMPT_RT, it falls back to up_read() followed by down_write()
>> for simplicity.
>>
>> Some uses of down_write() with long lock hold time may be changed
>> to the following format in the future:
>>
>> down_read()
>> /* check data */
>> if (upgrade_read()) {
>> /* unsafe upgrade, recheck data */
>> }
>> /* update data */
>> up_write();
>>
>> As long as the "recheck data" and "update data" parts are relatively
>> short compared with the "check data" part, this conversion may help to
>> improve parallelism and reduce lock contention.
>>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>> include/linux/rwsem.h | 5 ++++
>> kernel/locking/rwsem.c | 53 ++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 58 insertions(+)
>>
>> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
>> index 352c6127cb90..8ece58224f25 100644
>> --- a/include/linux/rwsem.h
>> +++ b/include/linux/rwsem.h
>> @@ -207,6 +207,11 @@ extern void up_write(struct rw_semaphore *sem);
>> */
>> extern void downgrade_write(struct rw_semaphore *sem);
>>
>> +/*
>> + * upgrade read lock to write lock
>> + */
>> +extern int upgrade_read(struct rw_semaphore *sem);
>> +
>> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> /*
>> * nested locking. NOTE: rwsems are not allowed to recurse
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 000e8d5a2884..aeb5b0668304 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -1203,6 +1203,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
>> return sem;
>> }
>>
>> +/*
>> + * Try to upgrade read lock to write lock
>> + */
>> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
>> +{
>> + long count = atomic_long_read(&sem->count);
>> +
>> + WARN_ON_ONCE(count & RWSEM_WRITER_LOCKED);
>> +
>> + /*
>> + * When upgrading from shared to exclusive ownership,
>> + * anything inside the write-locked region cannot leak
>> + * into the read side. Use an ACQUIRE semantics.
>> + */
>> + if (((count & RWSEM_READER_MASK) == RWSEM_READER_BIAS) &&
>> + atomic_long_try_cmpxchg_acquire(&sem->count, &count,
>> + count - RWSEM_READER_BIAS + RWSEM_WRITER_LOCKED)) {
>> + rwsem_set_owner(sem);
>> + return 1;
>> + }
>> + return 0;
>> +}
>> +
>> /*
>> * lock for reading
>> */
>> @@ -1438,6 +1461,11 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
>> rwbase_write_downgrade(&sem->rwbase);
>> }
>>
>> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
>> +{
>> + return 0;
>> +}
>> +
>> /* Debug stubs for the common API */
>> #define DEBUG_RWSEMS_WARN_ON(c, sem)
>>
>> @@ -1581,6 +1609,31 @@ void downgrade_write(struct rw_semaphore *sem)
>> }
>> EXPORT_SYMBOL(downgrade_write);
>>
>> +/*
>> + * Upgrade read lock to write lock
>> + *
>> + * Return: 0 when upgrade is safe, i.e. rwsem protected data do not change;
>> + * 1 when upgrade is unsafe as rwsem protected data may have changed.
>> + */
>> +int upgrade_read(struct rw_semaphore *sem)
>> +{
>> + if (__try_upgrade_read(sem)) {
>> + rwsem_release(&sem->dep_map, _RET_IP_);
>> + rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
>> + return 0;
>> + }
>> +
>> + /*
>> + * We cannot directly upgrade to the write lock, just do a regular
>> + * up_read() and down_write() sequence. The data protected by the
>> + * rwsem may have changed before the write lock is acquired.
>> + */
>> + down_read(sem);
>> + up_write(sem);
> Confused, the comment says up_read()+down_write(), however the code is
> down_read()+up_write().
Thanks for catching that typo. My bad.
>
> Besides, I don't like the idea that the value may have changed before
> the write lock is acquired if we call it "upgrade". Maybe we want api
> like down_read_upgradable(), which can be held in parallel with other
> down_read() but no other down_read_upgradable(), and one can only
> upgrade the read-side critical section created by
> down_read_upgradable(). For implementation, that means we need to have
> one extra bit for upgradable. Thoughts?
I like your idea. There are spare bits available and we can dedicate one
bit for that purpose. After successfully acquire the bit the reader can
probably spin a little bit and then insert itself to the head of the
wait queue to sleep. The last exiting reader can wake it up to acquire
the write lock.
I will probably use "try_upgrade_read() to indicate that the attempt may
fail.
Thanks for the suggestion.
Cheers,
Longman
Powered by blists - more mailing lists