linux-kernel - Re: [RFC PATCH] locking/rwsem: Add upgrade

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YUvVg6OJLIqg/rUZ@boqun-archlinux>
Date:   Thu, 23 Sep 2021 09:16:51 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Waiman Long <longman@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>,
        linux-kernel@...r.kernel.org, Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [RFC PATCH] locking/rwsem: Add upgrade_read()

On Wed, Sep 22, 2021 at 03:36:57PM -0400, Waiman Long wrote:
> Currently there are about 12 instances in the kernel where an up_read()
> is immediately followed by a down_write() of the same lock. For example,
> 
>   drivers/tty/n_tty.c:		up_read(&tty->termios_rwsem);
>   drivers/tty/n_tty.c-		down_write(&tty->termios_rwsem);
> 
> Since we have already provided a downgrade_write() function, we may as
> well provide an upgrade_read() function to make the code easier to read
> and the intention clearer.
> 
> If the current task is the only reader, the upgrade can be done by a
> single atomic operation. If not, the upgrade will have to be done by a
> separate up_read() call followed by a down_write(). In the former case,
> the handoff bit is not considered and the waiter will have to wait a
> bit longer to acquire the lock.
> 
> The new upgrade_read() function returns a value of 0 for safe upgrade
> where rwsem protected data won't change. Otherwise a value of 1 is
> returned to indicate unsafe upgrade where rwsem protected data may
> change during the upgrade process.
> 
> For PREEMPT_RT, it falls back to up_read() followed by down_write()
> for simplicity.
> 
> Some uses of down_write() with long lock hold time may be changed
> to the following format in the future:
> 
> 	down_read()
> 	/* check data */
> 	if (upgrade_read()) {
> 		/* unsafe upgrade, recheck data */
> 	}
> 	/* update data */
> 	up_write();
> 
> As long as the "recheck data" and "update data" parts are relatively
> short compared with the "check data" part, this conversion may help to
> improve parallelism and reduce lock contention.
> 
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
>  include/linux/rwsem.h  |  5 ++++
>  kernel/locking/rwsem.c | 53 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 352c6127cb90..8ece58224f25 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -207,6 +207,11 @@ extern void up_write(struct rw_semaphore *sem);
>   */
>  extern void downgrade_write(struct rw_semaphore *sem);
>  
> +/*
> + * upgrade read lock to write lock
> + */
> +extern int upgrade_read(struct rw_semaphore *sem);
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  /*
>   * nested locking. NOTE: rwsems are not allowed to recurse
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 000e8d5a2884..aeb5b0668304 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1203,6 +1203,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
>  	return sem;
>  }
>  
> +/*
> + * Try to upgrade read lock to write lock
> + */
> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
> +{
> +	long count = atomic_long_read(&sem->count);
> +
> +	WARN_ON_ONCE(count & RWSEM_WRITER_LOCKED);
> +
> +	/*
> +	 * When upgrading from shared to exclusive ownership,
> +	 * anything inside the write-locked region cannot leak
> +	 * into the read side. Use an ACQUIRE semantics.
> +	 */
> +	if (((count & RWSEM_READER_MASK) == RWSEM_READER_BIAS) &&
> +	     atomic_long_try_cmpxchg_acquire(&sem->count, &count,
> +			count - RWSEM_READER_BIAS + RWSEM_WRITER_LOCKED)) {
> +		rwsem_set_owner(sem);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * lock for reading
>   */
> @@ -1438,6 +1461,11 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
>  	rwbase_write_downgrade(&sem->rwbase);
>  }
>  
> +static inline int __try_upgrade_read(struct rw_semaphore *sem)
> +{
> +	return 0;
> +}
> +
>  /* Debug stubs for the common API */
>  #define DEBUG_RWSEMS_WARN_ON(c, sem)
>  
> @@ -1581,6 +1609,31 @@ void downgrade_write(struct rw_semaphore *sem)
>  }
>  EXPORT_SYMBOL(downgrade_write);
>  
> +/*
> + * Upgrade read lock to write lock
> + *
> + * Return: 0 when upgrade is safe, i.e. rwsem protected data do not change;
> + *         1 when upgrade is unsafe as rwsem protected data may have changed.
> + */
> +int upgrade_read(struct rw_semaphore *sem)
> +{
> +	if (__try_upgrade_read(sem)) {
> +		rwsem_release(&sem->dep_map, _RET_IP_);
> +		rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
> +		return 0;
> +	}
> +
> +	/*
> +	 * We cannot directly upgrade to the write lock, just do a regular
> +	 * up_read() and down_write() sequence. The data protected by the
> +	 * rwsem may have changed before the write lock is acquired.
> +	 */
> +	down_read(sem);
> +	up_write(sem);

Confused, the comment says up_read()+down_write(), however the code is
down_read()+up_write().

Besides, I don't like the idea that the value may have changed before
the write lock is acquired if we call it "upgrade". Maybe we want api
like down_read_upgradable(), which can be held in parallel with other
down_read() but no other down_read_upgradable(), and one can only
upgrade the read-side critical section created by
down_read_upgradable(). For implementation, that means we need to have
one extra bit for upgradable. Thoughts?

Regards,
Boqun

> +	return 1;
> +}
> +EXPORT_SYMBOL(upgrade_read);
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  
>  void down_read_nested(struct rw_semaphore *sem, int subclass)
> -- 
> 2.18.1
>