lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Nov 2021 13:57:35 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Muchun Song <songmuchun@...edance.com>
Cc:     mingo@...hat.com, will@...nel.org, longman@...hat.com,
        boqun.feng@...il.com, linux-kernel@...r.kernel.org,
        duanxiongchun@...edance.com, zhengqi.arch@...edance.com
Subject: Re: [PATCH] locking/rwsem: Optimize down_read_trylock() under highly
 contended case

On Thu, Nov 18, 2021 at 05:44:55PM +0800, Muchun Song wrote:

> By using the above benchmark, the real executing time on a x86-64 system
> before and after the patch were:

What kind of x86_64 ?

> 
>                   Before Patch  After Patch
>    # of Threads      real          real     reduced by
>    ------------     ------        ------    ----------
>          1          65,373        65,206       ~0.0%
>          4          15,467        15,378       ~0.5%
>         40           6,214         5,528      ~11.0%
> 
> For the uncontended case, the new down_read_trylock() is the same as
> before. For the contended cases, the new down_read_trylock() is faster
> than before. The more contended, the more fast.
> 
> Signed-off-by: Muchun Song <songmuchun@...edance.com>
> ---
>  kernel/locking/rwsem.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index c51387a43265..ef2b2a3f508c 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1249,17 +1249,14 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
>  
>  	DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);
>  
> -	/*
> -	 * Optimize for the case when the rwsem is not locked at all.
> -	 */
> -	tmp = RWSEM_UNLOCKED_VALUE;
> -	do {
> +	tmp = atomic_long_read(&sem->count);
> +	while (!(tmp & RWSEM_READ_FAILED_MASK)) {
>  		if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
> -					tmp + RWSEM_READER_BIAS)) {
> +						    tmp + RWSEM_READER_BIAS)) {
>  			rwsem_set_reader_owned(sem);
>  			return 1;
>  		}
> -	} while (!(tmp & RWSEM_READ_FAILED_MASK));
> +	}
>  	return 0;
>  }

This is weird... so the only difference is that leading load, but given
contention you'd expect that load to stall, also, given it's a
non-exclusive load, to get stolen by a competing CPU. Whereas the old
code would start with a cmpxchg, which obviously will also stall, but
does an exclusive load.

And the thinking is that the exclusive load and the presence of the
cmpxchg loop would keep the line on that CPU for a little while and
progress is made.

Clearly this isn't working as expected. Also I suppose it would need
wider testing...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ