[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4aFmHPUH4p4Up6+7ph95hcHxbJc87w7KDS23D6SKK7jUQ@mail.gmail.com>
Date: Wed, 21 Jan 2026 10:40:50 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, Petr Mladek <pmladek@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>, Kees Cook <kees@...nel.org>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>, Nathan Chancellor <nathan@...nel.org>,
Kiryl Shutsemau <kas@...nel.org>, Rick Edgecombe <rick.p.edgecombe@...el.com>,
linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev, x86@...nel.org
Subject: Re: [PATCH v1 11/14] x86/boot: use __seg_fs and __seg_gs in the
real-mode boot code
On Wed, Jan 21, 2026 at 9:56 AM Uros Bizjak <ubizjak@...il.com> wrote:
>
> On Tue, Jan 20, 2026 at 8:54 PM H. Peter Anvin <hpa@...or.com> wrote:
> >
> > All supported versions of gcc support __seg_fs and __seg_gs now.
> > All supported versions of clang support __seg_fs and __seg_gs too,
> > except for two bugs (as of clang 21, at least):
> >
> > 1. The %fs: and %gs: prefix does not get emitted in inline assembly.
> > 2. An internal compiler error when addressing symbols directly.
> >
> > However, none of these are required in the boot code. Furthermore,
> > this makes it possible to remove the absolute_pointer() hack in the
> > fs/gs access functions.
> >
> > This requires adding a barrier() to a20.c, to prevent the compiler
> > from eliding the load from the aliased memory address.
> >
> > Remove the unused memcmp_[fg]s() functions.
> >
> > Finally, ds() is by necessity constant, so mark the function as such.
> >
> > Signed-off-by: H. Peter Anvin (Intel) <hpa@...or.com>
> > ---
> > arch/x86/boot/a20.c | 1 +
> > arch/x86/boot/boot.h | 81 ++++++++++++++------------------------------
> > 2 files changed, 27 insertions(+), 55 deletions(-)
> >
> > diff --git a/arch/x86/boot/a20.c b/arch/x86/boot/a20.c
> > index 3ab6cd8eaa31..52c3fccdcb70 100644
> > --- a/arch/x86/boot/a20.c
> > +++ b/arch/x86/boot/a20.c
> > @@ -63,6 +63,7 @@ static int a20_test(int loops)
> > while (loops--) {
> > wrgs32(++ctr, A20_TEST_ADDR);
> > io_delay(); /* Serialize and make delay constant */
> > + barrier(); /* Compiler won't know about fs/gs overlap */
> > ok = rdfs32(A20_TEST_ADDR+0x10) ^ ctr;
> > if (ok)
> > break;
>
> This particular issue is not due to the compiler not knowing about
> fs/gs overlap (if the compiler determines that write and read are to
> the same non-volatile address, it would simply remove both, load and
> store), but due to the compiler performing load hoisting and store
> sinking from the loop. The compiler considers these two addresses as
> two different addresses (they are also defined in two different named
> address spaces), and optimizes access to them. So, without barrier(),
> it simply loads the value from A20_TEST_ADDR and A20_TEST_ADDR+0x10
> before the loop, resulting in:
>
> 9: 65 8b 0d 00 02 00 00 mov %gs:0x200,%ecx
> ...
> 16: 64 8b 35 10 02 00 00 mov %fs:0x210,%esi
> ...
> 28: 83 ea 01 sub $0x1,%edx
> 2b: 74 0a je 37 <a20_test_ref+0x37>
> 2d: e6 80 out %al,$0x80
> 2f: 89 d9 mov %ebx,%ecx
> 31: 29 d1 sub %edx,%ecx
> 33: 39 ce cmp %ecx,%esi
> 35: 74 f1 je 28 <a20_test_ref+0x28>
>
> The solution with barrier() introduces memory clobber between store
> and load, so the compiler is now forced to load and store the values
> due to the side effects of the barrier() inbetween. This kind of
> works, but is just a workaround for what really happens. In reality,
> the value at the test address changes "behind the compiler back", IOW
> - variable’s value can change in ways the compiler cannot predict.
>
> My proposal is to use a volatile pointer to an absolute address, so
> the unwanted optimizations are suppressed. The generated code is the
> same as with barrier(), but now the code tells the compiler that every
> read and write to this address must happen exactly as written in the
> source code. Before your patch, the accessors were defined with
> volatile asm, and this is the place where volatile qualifier matters.
> So, my proposed code would read:
>
> #define A20_TEST_ADDR (4*0x80)
>
> #define A20_TEST_GS (*(volatile __seg_gs u32 *)A20_TEST_ADDR)
> #define A20_TEST_FS (*(volatile __seg_fs u32 *)(A20_TEST_ADDR+0x10))
>
> static int a20_test(int loops)
> {
> int saved, ctr;
>
> set_fs(0xffff);
>
> saved = ctr = A20_TEST_GS;
>
> do {
> A20_TEST_GS = ++ctr;
> io_delay(); /* Make constant delay */
> if (A20_TEST_FS != ctr)
> break;
> } while (--loops);
>
> A20_TEST_GS = saved;
> return loops;
> }
Now also in the form of the attached patch vs. yout git hpa/boot2
branch, tested with gcc-15.2.1 and clang-21.1.8.
BR,
Uros.
View attachment "p.diff.txt" of type "text/plain" (1239 bytes)
Powered by blists - more mailing lists