[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZS2PIzzffqflnVoY@google.com>
Date: Mon, 16 Oct 2023 12:29:39 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Uros Bizjak <ubizjak@...il.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, Nadav Amit <namit@...are.com>,
Andy Lutomirski <luto@...nel.org>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H . Peter Anvin" <hpa@...or.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH -tip 3/3] x86/percpu: *NOT FOR MERGE* Implement
arch_raw_cpu_ptr() with RDGSBASE
On Mon, Oct 16, 2023, Ingo Molnar wrote:
>
> * Uros Bizjak <ubizjak@...il.com> wrote:
>
> > Sean says:
> > The instructions are guarded by a CR4 bit, the ucode cost just to check
> > CR4.FSGSBASE is probably non-trivial."
>
> BTW., a side note regarding the very last paragraph and the CR4 bit ucode
> cost, given that SMAP is CR4 controlled too:
>
> #define X86_CR4_FSGSBASE_BIT 16 /* enable RDWRFSGS support */
> #define X86_CR4_FSGSBASE _BITUL(X86_CR4_FSGSBASE_BIT)
> ...
> #define X86_CR4_SMAP_BIT 21 /* enable SMAP support */
> #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT)
>
> And this modifies the behavior of STAC/CLAC, of which we have ~300
> instances in a defconfig kernel image:
>
> kepler:~/tip> objdump -wdr vmlinux | grep -w 'stac' x | wc -l
> 119
>
> kepler:~/tip> objdump -wdr vmlinux | grep -w 'clac' x | wc -l
> 188
>
> Are we certain that ucode on modern x86 CPUs check CR4 for every affected
> instruction?
Not certain at all. I agree the CR4.FSGSBASE thing could be a complete non-issue
and was just me speculating.
> Could they perhaps use something faster, such as internal microcode-patching
> (is that a thing?), to turn support for certain instructions on/off when the
> relevant CR4 bit is modified, without having to genuinely access CR4 for
> every instruction executed?
I don't know the exact details, but Intel's VMRESUME ucode flow uses some form of
magic to skip consistency checks that aren't relevant for the current (or target)
mode, *without* using conditional branches. So it's definitely possible/probable
that similar magic is used to expedite things like CPL and CR4 checks.
Powered by blists - more mailing lists