lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg98L2qaw1U-7NDFQi6dOK=CRO6H-1q1LXuEh348Dk=2Q@mail.gmail.com>
Date:   Thu, 19 Oct 2023 10:04:56 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Uros Bizjak <ubizjak@...il.com>, Nadav Amit <namit@...are.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

On Thu, 19 Oct 2023 at 01:54, Peter Zijlstra <peterz@...radead.org> wrote:
>
> seqlock actually wants rmb even today, the pattern is:

That's the pattern today, and yes, it makes superficial sense, but no,
it's not a "really wants" issue.

>     do {
>         seq = load-seq
>         rmb
>         load-data
>         rmb
>     } while (seq != re-load-seq)
>
> we specifically only care about loads, and the data loads must be
> between the sequence number loads.
>
> As such, load-acquire is not a natural match.

You are correct that "rmb" _sounds_ logical. We do, after all, want to
order reads wrt each other.

So rmb is the obvious choice.

But the thing is, "read_acquire" actually does that too.

So if you do

        seq = load_acquire(orig_seq);
        load-data

then that acquire actually makes that first 'rmb' pointless. Acquire
already guarantees that all subsequent memory operations are ordered
wrt that read.

And 'acquire' is likely faster than 'rmb' on sane modern architectures.

On x86 it doesn't matter (rmb is a no-op, and all loads are acquires).

But on arm64, for example, you can do a 'ld.acq' in one instruction
and you're done - while a rmb then ends up being a barrier (ok, the
asm mnemonics are horrible: it's not "ld.acq", it's "ldar", but
whatever - I like arm64 as an architecture, but I think they made the
standard assembly syntax pointlessly and actively hostile to humans).

Of course then microarchitectures may end up doing basically the same
thing, but at least technically the 'load acquire' is likely more
targeted and more optimized.

The second rmb is then harder to change, and that is going to stay an
rmb ( you could say "do an acquire on the last data load, but that
doesn't fit the sane locking semantics of a sequence lock).

But I do think our sequence counters would be better off using
"smp_load_acquire()" for that initial read.

Of course, then the percpu case doesn't care about the SMP ordering,
but it should still use an UP barrier to make sure things don't get
re-ordered. Relying on our "percpu_read()" ordering other reads around
it is *wrong*.

             Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ