linux-kernel - Re: [PATCH v3] Avoid memory barrier in read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e9fd5ba0-bd84-76a8-a96e-1378c66d0774@gentwo.org>
Date: Wed, 23 Oct 2024 16:42:36 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Peter Zijlstra <peterz@...radead.org>
cc: Linus Torvalds <torvalds@...ux-foundation.org>, 
    Will Deacon <will@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, 
    Catalin Marinas <catalin.marinas@....com>, Ingo Molnar <mingo@...hat.com>, 
    Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
    linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
    linux-arm-kernel@...ts.infradead.org, linux-arch@...r.kernel.org
Subject: Re: [PATCH v3] Avoid memory barrier in read_seqcount() through load
 acquire

On Wed, 23 Oct 2024, Peter Zijlstra wrote:

> > I doubt anybody will notice, and smp_load_acquire() is the future. Any
> > architecture that does badly on it just doesn't matter (and, as
> > mentioned, I don't think they even exist - "smp_rmb()" is generally at
> > least as expensive).
>
> Do we want to do the complementing patch and make write_seqcount_end()
> use smp_store_release() ?
>
> I think at least ARM (the 32bit thing) has wmb but uses mb for
> store_release. But I also think I don't really care about that.

The proper instruction would be something like

atomic_inc_release(&seqcount)

The current atomics do not provide such a macro.

The closest in the current tree is atomic_inc_return_release().

We would prefer atomic_inc_release(&seqcount) because such an
atomic may be executed as a far atomic in the ARM mesh.

This could be cheaper than a local atomic and could f.e. be executed
on the memory controller of a remote NUMA node in order to avoid a costly
transfer of cacheline ownership.

The code generated is a atomic that also does a release. So there would be
no extra barrier etc needed.