[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZSmwYmxjgC0p0wdr@google.com>
Date: Fri, 13 Oct 2023 14:02:26 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Uros Bizjak <ubizjak@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Nadav Amit <namit@...are.com>, Ingo Molnar <mingo@...nel.org>,
Andy Lutomirski <luto@...nel.org>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H . Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH tip] x86/percpu: Rewrite arch_raw_cpu_ptr()
On Fri, Oct 13, 2023, Uros Bizjak wrote:
> On Fri, Oct 13, 2023 at 6:04 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Wed, Oct 11, 2023, Uros Bizjak wrote:
> > > Additionaly, the patch introduces 'rdgsbase' alternative for CPUs with
> > > X86_FEATURE_FSGSBASE. The rdgsbase instruction *probably* will end up
> > > only decoding in the first decoder etc. But we're talking single-cycle
> > > kind of effects, and the rdgsbase case should be much better from
> > > a cache perspective and might use fewer memory pipeline resources to
> > > offset the fact that it uses an unusual front end decoder resource...
> >
> > The switch to RDGSBASE should be a separate patch, and should come with actual
> > performance numbers.
>
> This *is* the patch to switch to RDGSBASE. The propagation of
> arguments is a nice side-effect of the patch. due to the explicit
> addition of the offset addend to the %gs base. This patch is
> alternative implementation of [1]
>
> [1] x86/percpu: Use C for arch_raw_cpu_ptr(),
> https://lore.kernel.org/lkml/20231010164234.140750-1-ubizjak@gmail.com/
Me confused, can't you first switch to MOV with tcp_ptr__ += (unsigned long)(ptr),
and then introduce the RDGSBASE alternative?
> Unfortunately, I have no idea on how to measure the impact of such a
> low-level feature, so I'll at least need some guidance. The "gut
> feeling" says that special instruction, intended to support the
> feature, is always better than emulating said feature with a memory
> access.
AIUI, {RD,WR}{FS,GS}BASE were added as faster alternatives to {RD,WR}MSR, not to
accelerate actual accesses to per-CPU data, TLS, etc. E.g. loading a 64-bit base
via a MOV to FS/GS is impossible. And presumably saving a userspace controlled
by actually accessing FS/GS is dangerous for one reason or another.
The instructions are guarded by a CR4 bit, the ucode cost just to check CR4.FSGSBASE
is probably non-trivial.
Powered by blists - more mailing lists