[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SA1PR11MB6734B878A371049E274D6E60A8AF2@SA1PR11MB6734.namprd11.prod.outlook.com>
Date: Sun, 21 Jul 2024 18:09:01 +0000
From: "Li, Xin3" <xin3.li@...el.com>
To: Sean Christopherson <seanjc@...gle.com>, "H. Peter Anvin" <hpa@...or.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-doc@...r.kernel.org"
<linux-doc@...r.kernel.org>, "linux-kselftest@...r.kernel.org"
<linux-kselftest@...r.kernel.org>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "corbet@....net" <corbet@....net>,
"tglx@...utronix.de" <tglx@...utronix.de>, "mingo@...hat.com"
<mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "x86@...nel.org"
<x86@...nel.org>, "shuah@...nel.org" <shuah@...nel.org>,
"vkuznets@...hat.com" <vkuznets@...hat.com>, "peterz@...radead.org"
<peterz@...radead.org>, "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
"xin@...or.com" <xin@...or.com>
Subject: RE: [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and
guest
> On Thu, Jul 18, 2024, H. Peter Anvin wrote:
> > On July 12, 2024 8:12:51 AM PDT, Sean Christopherson <seanjc@...gle.com>
> wrote:
> > >On Wed, Jul 10, 2024, Xin3 Li wrote:
> > >> > On Wed, Feb 07, 2024, Xin Li wrote:
> > >> > > Switch MSR_IA32_FRED_RSP0 between host and guest in
> > >> > > vmx_prepare_switch_to_{host,guest}().
> > >> > >
> > >> > > MSR_IA32_FRED_RSP0 is used during ring 3 event delivery only,
> > >> > > thus KVM, running on ring 0, can run safely with guest FRED
> > >> > > RSP0, i.e., no need to switch between host/guest FRED RSP0 during VM
> entry and exit.
> > >> > >
> > >> > > KVM should switch to host FRED RSP0 before returning to user
> > >> > > level, and switch to guest FRED RSP0 before entering guest mode.
> > >> >
> > >> > Heh, if only KVM had a framework that was specifically designed
> > >> > for context switching MSRs on return to userspace. Translation:
> > >> > please use the
> > >> > user_return_msr() APIs.
> > >>
> > >> IIUC the user return MSR framework works for MSRs that are per CPU
> > >> constants, but like MSR_KERNEL_GS_BASE, MSR_IA32_FRED_RSP0 is a per
> > >> *task* constant, thus we can't use it.
> > >
> > >Ah, in that case, the changelog is very misleading and needs to be fixed.
> > >Alternatively, is the desired RSP0 value tracked anywhere other than the MSR?
> > >E.g. if it's somewhere in task_struct, then kvm_on_user_return()
> > >would restore the current task's desired RSP0. Even if we don't get
> > >fancy, avoiding the RDMSR to get the current task's value would be nice.
> >
> > Hm, perhaps the right thing to do is to always invoke this function
> > before a context switch happens if that happens before return to user space?
>
> Actually, if the _TIF_NEED_RSP0_LOAD doesn't provide a meaningful benefit (or
> y'all just don't want it :-) ),
We want it 😊.
My concern was adding an extra check of (ti_work & _TIF_NEED_RSP0_LOAD)
into a hot function arch_exit_to_user_mode_prepare(). HPA checked the
function and suggested to test ti_work for zero and then process
individual bits in it:
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index fb2809b20b0a..4c78b99060b5 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -47,15 +47,17 @@ static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs)
static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
unsigned long ti_work)
{
- if (ti_work & _TIF_USER_RETURN_NOTIFY)
- fire_user_return_notifiers();
+ if (ti_work) {
+ if (ti_work & _TIF_USER_RETURN_NOTIFY)
+ fire_user_return_notifiers();
- if (unlikely(ti_work & _TIF_IO_BITMAP))
- tss_update_io_bitmap();
+ if (unlikely(ti_work & _TIF_IO_BITMAP))
+ tss_update_io_bitmap();
- fpregs_assert_state_consistent();
- if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))
- switch_fpu_return();
+ fpregs_assert_state_consistent();
+ if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))
+ switch_fpu_return();
+ }
#ifdef CONFIG_COMPAT
/*
Based on it, I measured how many 0s are out of every one million ti_work
values in kernel build tests, it's over 99%, i.e., unlikely(ti_work).
When booting a KVM guest, it becomes 75%, which is expected. After the
guest is up running kernel build in it, it's 99% again.
So at least this patch seems a low-hanging fruit, and I have sent it to
Intel 0day for broader perf tests.
As context switches are way less frequent than exit to user mode, I do
NOT expect it makes a difference to write MSR_IA32_FRED_RSP0 on exit to
user mode instead of on context switch especially when we do it on top
of the above patch.
Powered by blists - more mailing lists