[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251104220100.wrorcuok5slqy74u@desk>
Date: Tue, 4 Nov 2025 14:01:00 -0800
From: Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
David Kaplan <david.kaplan@....com>,
Sean Christopherson <seanjc@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Asit Mallick <asit.k.mallick@...el.com>,
Tao Zhang <tao1.zhang@...el.com>
Subject: Re: [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long
loop
On Mon, Nov 03, 2025 at 12:45:35PM -0800, Dave Hansen wrote:
> On 10/27/25 16:43, Pawan Gupta wrote:
> > Long loop is used to clear the branch history when switching from a guest
> > to host userspace. The LFENCE barrier is not required in this case as ring
> > transition itself acts as a barrier.
> >
> > Move the prologue, LFENCE and epilogue out of __CLEAR_BHB_LOOP macro to
> > allow skipping the LFENCE in the long loop variant. Rename the long loop
> > function to clear_bhb_long_loop_no_barrier() to reflect the change.
>
> Too. Much. Assembly.
>
> Is there a reason we can't do more of this in C?
Apart from VMSCAPE, BHB clearing is also required when entering kernel from
system calls. And one of the safety requirement is to absolutely not
execute any indirect call/jmp unless we have cleared the BHB. In a C
implementation we cannot guarantee that the compiler won't generate
indirect branches before the BHB clearing can be done.
> Can we have _one_ assembly function, please? One that takes the loop
> counts? No macros, no duplication functions. Just one:
This seems possible for all the C callers. ASM callers should stick to asm
versions of BHB clearing to guarantee the compiler did not do anything
funky that would break the mitigation.
> void __clear_bhb_loop(int inner, int outer);
>
> Then we have sensible code that looks like this:
>
> void clear_bhb_loop()
> {
> __clear_bhb_loop(inner, outer);
> lfence();
> }
>
> void clear_bhb_loop_nofence()
> {
> __clear_bhb_loop(inner, outer);
> }
>
> We don't need a short and a long *version*. We just have one function
> (or pair of functions) that gets called that works everywhere.
>
> Actually, if you just used global variables and called the assembly one:
>
> extern void clear_bhb_loop_nofence();
>
> then the other implementation would just be:
>
> void clear_bhb_loop()
> {
> __clear_bhb_loop(inner, outer);
> lfence();
> }
>
> Then we have *ONE* assembly function instead of four.
>
> Right? What am I missing?
Overall, these look to be good improvements to me. The only concern is
making sure that we don't inadvertently call the C version from places that
strictly require no indirect branches before BHB clearing.
> Does the LFENCE *need* to be before that last pop and RET?
At syscall entry, VMexit and BPF (for native BHI mitigation), it does not
matter whether the LFENCE is before or after the last RET, if we can
guarantee that there will be no indirect call/jmp before LFENCE. C version
may not be able to provide this guarantee.
For exit-to-userspace (for VMSCAPE), C implementation is perfectly fine
since the goal is to protect userspace.
To summarize, only 1 of the BHB clear callsite can safely use the C
version, while others need to continue to use the assembly version. I do
not anticipate more such callsites that would be okay with indirect
branches before BHB clearing.
I am open to suggestions on making the code more readable while ensuring
the safety.
Powered by blists - more mailing lists