lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e6095b17-d825-e60d-c15e-1c93c3289ce5@redhat.com>
Date:   Tue, 27 Dec 2022 22:56:48 -0500
From:   Waiman Long <longman@...hat.com>
To:     guoren@...nel.org, peterz@...radead.org
Cc:     linux-kernel@...r.kernel.org, Guo Ren <guoren@...ux.alibaba.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>
Subject: Re: [RFC PATCH] locking/barriers: Introduce
 smp_cond_load_mask_relaxed & acquire

On 12/25/22 06:55, guoren@...nel.org wrote:
> From: Guo Ren <guoren@...ux.alibaba.com>
>
> The current cond_load primitive contains two parts (condition expression
> and load value), but the usage of cond_load may require the sub-size
> condition expression of the load size. That means hardware could utilize
> a mask argument to optimize the wait condition. If the mask argument
> size is less than the hardware minimum wait size, the hardware uses its
> minimum size.
>
> The patch contains a qspinlock example: When it is at the head of the
> waitqueue, it waits for the owner & pending to go away. The forward
> progress condition only cares locked_pending part, but it needs to load
> the 32-bit lock value as a return.
>
> That also means WFE-liked instruction would need a mask argument of the
> load reservation set.
>
> Signed-off-by: Guo Ren <guoren@...ux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@...nel.org>
> Cc: Waiman Long <longman@...hat.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Boqun Feng <boqun.feng@...il.com>
> Cc: Will Deacon <will@...nel.org>
> Cc: Ingo Molnar <mingo@...hat.com>
> ---
>   include/asm-generic/barrier.h | 22 ++++++++++++++++++++++
>   include/linux/atomic.h        |  4 ++++
>   kernel/locking/qspinlock.c    |  3 ++-
>   3 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 961f4d88f9ef..fec61629f769 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -275,6 +275,28 @@ do {									\
>   })
>   #endif
>   
> +/**
> + * smp_cond_load_mask_relaxed() - (Spin) wait for cond with ACQUIRE ordering
> + * @ptr: pointer to the variable to wait on
> + * @cond: boolean expression to wait for
> + * @mask: mask *ptr to wait for (effect of 0 is the same with -1)
> + */
> +#ifndef smp_cond_load_mask_relaxed
> +#define smp_cond_load_mask_relaxed(ptr, cond_expr, mask)	\
> +	smp_cond_load_relaxed(ptr, cond_expr)
> +#endif
> +
> +/**
> + * smp_cond_load_mask_acquire() - (Spin) wait for cond with ACQUIRE ordering
> + * @ptr: pointer to the variable to wait on
> + * @cond: boolean expression to wait for
> + * @mask: mask *ptr to wait for (effect of 0 is the same with -1)
> + */
> +#ifndef smp_cond_load_mask_acquire
> +#define smp_cond_load_mask_acquire(ptr, cond_expr, mask)	\
> +	smp_cond_load_acquire(ptr, cond_expr)
> +#endif
> +
>   /*
>    * pmem_wmb() ensures that all stores for which the modification
>    * are written to persistent storage by preceding instructions have
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 8dd57c3a99e9..dc7351945f27 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -27,9 +27,13 @@
>   
>   #define atomic_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
>   #define atomic_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
> +#define atomic_cond_read_mask_acquire(v, c, m) smp_cond_load_mask_acquire(&(v)->counter, (c), (m))
> +#define atomic_cond_read_mask_relaxed(v, c, m) smp_cond_load_mask_relaxed(&(v)->counter, (c), (m))
>   
>   #define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
>   #define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
> +#define atomic64_cond_read_mask_acquire(v, c, m) smp_cond_load_mask_acquire(&(v)->counter, (c), (m))
> +#define atomic64_cond_read_mask_relaxed(v, c, m) smp_cond_load_mask_relaxed(&(v)->counter, (c), (m))
>   
>   /*
>    * The idea here is to build acquire/release variants by adding explicit
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index ebe6b8ec7cb3..14fdd2ee752c 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -511,7 +511,8 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>   	if ((val = pv_wait_head_or_lock(lock, node)))
>   		goto locked;
>   
> -	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
> +	val = atomic_cond_read_mask_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK),
> +					    _Q_LOCKED_PENDING_MASK);
>   
>   locked:
>   	/*

This patch is essentially a no-op. You will have to have at least one 
arch that has its own version of smp_cond_load_mask*() and get some 
benefit out of it. Otherwise, it is not likely to be merged.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ