lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <183ee6fa-1d42-4a01-8446-4f20942680d2@redhat.com>
Date: Tue, 13 Aug 2024 15:01:36 -0400
From: Waiman Long <longman@...hat.com>
To: cl@...two.org, Catalin Marinas <catalin.marinas@....com>,
 Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
 Ingo Molnar <mingo@...hat.com>, Boqun Feng <boqun.feng@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH RFC] Avoid memory barrier in read_seqcount() through load
 acquire


On 8/13/24 14:26, Christoph Lameter via B4 Relay wrote:
> From: "Christoph Lameter (Ampere)" <cl@...two.org>
>
> Some architectures support load acquire which can save us a memory
> barrier and save some cycles.
>
> A typical sequence
>
> 	do {
> 		seq = read_seqcount_begin(&s);
> 		<something>
> 	} while (read_seqcount_retry(&s, seq);
>
> requires 13 cycles on ARM64 for an empty loop. Two read memory barriers are
> needed. One for each of the seqcount_* functions.
>
> We can replace the first read barrier with a load acquire of
> the seqcount which saves us one barrier.
>
> On ARM64 doing so reduces the cycle count from 13 to 8.
>
> Signed-off-by: Christoph Lameter (Ampere) <cl@...two.org>
> ---
>   arch/Kconfig            |  5 +++++
>   arch/arm64/Kconfig      |  1 +
>   include/linux/seqlock.h | 41 +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 47 insertions(+)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..3f8867110a57 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1600,6 +1600,11 @@ config ARCH_HAS_KERNEL_FPU_SUPPORT
>   	  Architectures that select this option can run floating-point code in
>   	  the kernel, as described in Documentation/core-api/floating-point.rst.
>   
> +config ARCH_HAS_ACQUIRE_RELEASE
> +	bool
> +	help
> +	  Architectures that support acquire / release can avoid memory fences
> +
>   source "kernel/gcov/Kconfig"
>   
>   source "scripts/gcc-plugins/Kconfig"
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2f8ff354ca6..19e34fff145f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -39,6 +39,7 @@ config ARM64
>   	select ARCH_HAS_PTE_DEVMAP
>   	select ARCH_HAS_PTE_SPECIAL
>   	select ARCH_HAS_HW_PTE_YOUNG
> +	select ARCH_HAS_ACQUIRE_RELEASE
>   	select ARCH_HAS_SETUP_DMA_OPS
>   	select ARCH_HAS_SET_DIRECT_MAP
>   	select ARCH_HAS_SET_MEMORY

Do we need a new ARCH flag? I believe barrier APIs like 
smp_load_acquire() will use the full barrier for those arch'es that 
don't define their own smp_load_acquire().

BTW, acquire/release can be considered memory barriers too. Maybe you 
are talking about preferring acquire/release barriers over read/write 
barriers. Right?

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ