[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <183ee6fa-1d42-4a01-8446-4f20942680d2@redhat.com>
Date: Tue, 13 Aug 2024 15:01:36 -0400
From: Waiman Long <longman@...hat.com>
To: cl@...two.org, Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Boqun Feng <boqun.feng@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH RFC] Avoid memory barrier in read_seqcount() through load
acquire
On 8/13/24 14:26, Christoph Lameter via B4 Relay wrote:
> From: "Christoph Lameter (Ampere)" <cl@...two.org>
>
> Some architectures support load acquire which can save us a memory
> barrier and save some cycles.
>
> A typical sequence
>
> do {
> seq = read_seqcount_begin(&s);
> <something>
> } while (read_seqcount_retry(&s, seq);
>
> requires 13 cycles on ARM64 for an empty loop. Two read memory barriers are
> needed. One for each of the seqcount_* functions.
>
> We can replace the first read barrier with a load acquire of
> the seqcount which saves us one barrier.
>
> On ARM64 doing so reduces the cycle count from 13 to 8.
>
> Signed-off-by: Christoph Lameter (Ampere) <cl@...two.org>
> ---
> arch/Kconfig | 5 +++++
> arch/arm64/Kconfig | 1 +
> include/linux/seqlock.h | 41 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 47 insertions(+)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..3f8867110a57 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -1600,6 +1600,11 @@ config ARCH_HAS_KERNEL_FPU_SUPPORT
> Architectures that select this option can run floating-point code in
> the kernel, as described in Documentation/core-api/floating-point.rst.
>
> +config ARCH_HAS_ACQUIRE_RELEASE
> + bool
> + help
> + Architectures that support acquire / release can avoid memory fences
> +
> source "kernel/gcov/Kconfig"
>
> source "scripts/gcc-plugins/Kconfig"
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2f8ff354ca6..19e34fff145f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -39,6 +39,7 @@ config ARM64
> select ARCH_HAS_PTE_DEVMAP
> select ARCH_HAS_PTE_SPECIAL
> select ARCH_HAS_HW_PTE_YOUNG
> + select ARCH_HAS_ACQUIRE_RELEASE
> select ARCH_HAS_SETUP_DMA_OPS
> select ARCH_HAS_SET_DIRECT_MAP
> select ARCH_HAS_SET_MEMORY
Do we need a new ARCH flag? I believe barrier APIs like
smp_load_acquire() will use the full barrier for those arch'es that
don't define their own smp_load_acquire().
BTW, acquire/release can be considered memory barriers too. Maybe you
are talking about preferring acquire/release barriers over read/write
barriers. Right?
Cheers,
Longman
Powered by blists - more mailing lists