lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 22 Sep 2021 21:45:08 -0400 From: Waiman Long <llong@...hat.com> To: Boqun Feng <boqun.feng@...il.com> Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>, linux-kernel@...r.kernel.org, Davidlohr Bueso <dave@...olabs.net> Subject: Re: [RFC PATCH] locking/rwsem: Add upgrade_read() On 9/22/21 9:16 PM, Boqun Feng wrote: > On Wed, Sep 22, 2021 at 03:36:57PM -0400, Waiman Long wrote: >> Currently there are about 12 instances in the kernel where an up_read() >> is immediately followed by a down_write() of the same lock. For example, >> >> drivers/tty/n_tty.c: up_read(&tty->termios_rwsem); >> drivers/tty/n_tty.c- down_write(&tty->termios_rwsem); >> >> Since we have already provided a downgrade_write() function, we may as >> well provide an upgrade_read() function to make the code easier to read >> and the intention clearer. >> >> If the current task is the only reader, the upgrade can be done by a >> single atomic operation. If not, the upgrade will have to be done by a >> separate up_read() call followed by a down_write(). In the former case, >> the handoff bit is not considered and the waiter will have to wait a >> bit longer to acquire the lock. >> >> The new upgrade_read() function returns a value of 0 for safe upgrade >> where rwsem protected data won't change. Otherwise a value of 1 is >> returned to indicate unsafe upgrade where rwsem protected data may >> change during the upgrade process. >> >> For PREEMPT_RT, it falls back to up_read() followed by down_write() >> for simplicity. >> >> Some uses of down_write() with long lock hold time may be changed >> to the following format in the future: >> >> down_read() >> /* check data */ >> if (upgrade_read()) { >> /* unsafe upgrade, recheck data */ >> } >> /* update data */ >> up_write(); >> >> As long as the "recheck data" and "update data" parts are relatively >> short compared with the "check data" part, this conversion may help to >> improve parallelism and reduce lock contention. >> >> Signed-off-by: Waiman Long <longman@...hat.com> >> --- >> include/linux/rwsem.h | 5 ++++ >> kernel/locking/rwsem.c | 53 ++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 58 insertions(+) >> >> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h >> index 352c6127cb90..8ece58224f25 100644 >> --- a/include/linux/rwsem.h >> +++ b/include/linux/rwsem.h >> @@ -207,6 +207,11 @@ extern void up_write(struct rw_semaphore *sem); >> */ >> extern void downgrade_write(struct rw_semaphore *sem); >> >> +/* >> + * upgrade read lock to write lock >> + */ >> +extern int upgrade_read(struct rw_semaphore *sem); >> + >> #ifdef CONFIG_DEBUG_LOCK_ALLOC >> /* >> * nested locking. NOTE: rwsems are not allowed to recurse >> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c >> index 000e8d5a2884..aeb5b0668304 100644 >> --- a/kernel/locking/rwsem.c >> +++ b/kernel/locking/rwsem.c >> @@ -1203,6 +1203,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem) >> return sem; >> } >> >> +/* >> + * Try to upgrade read lock to write lock >> + */ >> +static inline int __try_upgrade_read(struct rw_semaphore *sem) >> +{ >> + long count = atomic_long_read(&sem->count); >> + >> + WARN_ON_ONCE(count & RWSEM_WRITER_LOCKED); >> + >> + /* >> + * When upgrading from shared to exclusive ownership, >> + * anything inside the write-locked region cannot leak >> + * into the read side. Use an ACQUIRE semantics. >> + */ >> + if (((count & RWSEM_READER_MASK) == RWSEM_READER_BIAS) && >> + atomic_long_try_cmpxchg_acquire(&sem->count, &count, >> + count - RWSEM_READER_BIAS + RWSEM_WRITER_LOCKED)) { >> + rwsem_set_owner(sem); >> + return 1; >> + } >> + return 0; >> +} >> + >> /* >> * lock for reading >> */ >> @@ -1438,6 +1461,11 @@ static inline void __downgrade_write(struct rw_semaphore *sem) >> rwbase_write_downgrade(&sem->rwbase); >> } >> >> +static inline int __try_upgrade_read(struct rw_semaphore *sem) >> +{ >> + return 0; >> +} >> + >> /* Debug stubs for the common API */ >> #define DEBUG_RWSEMS_WARN_ON(c, sem) >> >> @@ -1581,6 +1609,31 @@ void downgrade_write(struct rw_semaphore *sem) >> } >> EXPORT_SYMBOL(downgrade_write); >> >> +/* >> + * Upgrade read lock to write lock >> + * >> + * Return: 0 when upgrade is safe, i.e. rwsem protected data do not change; >> + * 1 when upgrade is unsafe as rwsem protected data may have changed. >> + */ >> +int upgrade_read(struct rw_semaphore *sem) >> +{ >> + if (__try_upgrade_read(sem)) { >> + rwsem_release(&sem->dep_map, _RET_IP_); >> + rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_); >> + return 0; >> + } >> + >> + /* >> + * We cannot directly upgrade to the write lock, just do a regular >> + * up_read() and down_write() sequence. The data protected by the >> + * rwsem may have changed before the write lock is acquired. >> + */ >> + down_read(sem); >> + up_write(sem); > Confused, the comment says up_read()+down_write(), however the code is > down_read()+up_write(). Thanks for catching that typo. My bad. > > Besides, I don't like the idea that the value may have changed before > the write lock is acquired if we call it "upgrade". Maybe we want api > like down_read_upgradable(), which can be held in parallel with other > down_read() but no other down_read_upgradable(), and one can only > upgrade the read-side critical section created by > down_read_upgradable(). For implementation, that means we need to have > one extra bit for upgradable. Thoughts? I like your idea. There are spare bits available and we can dedicate one bit for that purpose. After successfully acquire the bit the reader can probably spin a little bit and then insert itself to the head of the wait queue to sleep. The last exiting reader can wake it up to acquire the write lock. I will probably use "try_upgrade_read() to indicate that the attempt may fail. Thanks for the suggestion. Cheers, Longman
Powered by blists - more mailing lists