[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgepFm=jGodFQYPAaEvcBhR3-f_h1BLBYiVQsutCwCnUQ@mail.gmail.com>
Date: Sun, 8 Oct 2023 10:59:43 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Andy Lutomirski <luto@...nel.org>,
Ingo Molnar <mingo@...nel.org>, Nadav Amit <namit@...are.com>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H . Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors
On Wed, 4 Oct 2023 at 07:51, Uros Bizjak <ubizjak@...il.com> wrote:
>
> The percpu code mostly uses inline assembly. Using segment qualifiers
> allows to use C code instead, which enables the compiler to perform
> various optimizations (e.g. propagation of memory arguments). Convert
> percpu read and write accessors to C code, so the memory argument can
> be propagated to the instruction that uses this argument.
So apparently this causes boot failures.
It might be worth testing a version where this:
> +#define raw_cpu_read_1(pcp) __raw_cpu_read(, pcp)
> +#define raw_cpu_read_2(pcp) __raw_cpu_read(, pcp)
> +#define raw_cpu_read_4(pcp) __raw_cpu_read(, pcp)
> +#define raw_cpu_write_1(pcp, val) __raw_cpu_write(, pcp, val)
> +#define raw_cpu_write_2(pcp, val) __raw_cpu_write(, pcp, val)
> +#define raw_cpu_write_4(pcp, val) __raw_cpu_write(, pcp, val)
and this
> +#ifdef CONFIG_X86_64
> +#define raw_cpu_read_8(pcp) __raw_cpu_read(, pcp)
> +#define raw_cpu_write_8(pcp, val) __raw_cpu_write(, pcp, val)
was all using 'volatile' in the qualifier argument and see if that
makes the boot failure go away.
Because while the old code wasn't "asm volatile", even just a *plain*
asm() is certainly a lot more serialized than a normal access.
For example, the asm() version of raw_cpu_write() used "+m" for the
destination modifier, which means that if you did multiple percpu
writes to the same variable, gcc would output multiple asm calls,
because it would see the subsequent ones as reading the old value
(even if they don't *actually* do so).
That's admittedly really just because it uses a common macro for
raw_cpu_write() and the updates (like the percpu_add() code), so the
fact that it uses "+m" instead of "=m" is just a random odd artifact
of the inline asm version, but maybe we have code that ends up working
just by accident.
Also, I'm not sure gcc re-orders asms wrt each other - even when they
aren't marked volatile.
So it might be worth at least a trivial "make everything volatile"
test to see if that affects anything.
Linus
Powered by blists - more mailing lists