lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SA1PR11MB6734B878A371049E274D6E60A8AF2@SA1PR11MB6734.namprd11.prod.outlook.com>
Date: Sun, 21 Jul 2024 18:09:01 +0000
From: "Li, Xin3" <xin3.li@...el.com>
To: Sean Christopherson <seanjc@...gle.com>, "H. Peter Anvin" <hpa@...or.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-doc@...r.kernel.org"
	<linux-doc@...r.kernel.org>, "linux-kselftest@...r.kernel.org"
	<linux-kselftest@...r.kernel.org>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "corbet@....net" <corbet@....net>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "mingo@...hat.com"
	<mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "x86@...nel.org"
	<x86@...nel.org>, "shuah@...nel.org" <shuah@...nel.org>,
	"vkuznets@...hat.com" <vkuznets@...hat.com>, "peterz@...radead.org"
	<peterz@...radead.org>, "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
	"xin@...or.com" <xin@...or.com>
Subject: RE: [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and
 guest

 
> On Thu, Jul 18, 2024, H. Peter Anvin wrote:
> > On July 12, 2024 8:12:51 AM PDT, Sean Christopherson <seanjc@...gle.com>
> wrote:
> > >On Wed, Jul 10, 2024, Xin3 Li wrote:
> > >> > On Wed, Feb 07, 2024, Xin Li wrote:
> > >> > > Switch MSR_IA32_FRED_RSP0 between host and guest in
> > >> > > vmx_prepare_switch_to_{host,guest}().
> > >> > >
> > >> > > MSR_IA32_FRED_RSP0 is used during ring 3 event delivery only,
> > >> > > thus KVM, running on ring 0, can run safely with guest FRED
> > >> > > RSP0, i.e., no need to switch between host/guest FRED RSP0 during VM
> entry and exit.
> > >> > >
> > >> > > KVM should switch to host FRED RSP0 before returning to user
> > >> > > level, and switch to guest FRED RSP0 before entering guest mode.
> > >> >
> > >> > Heh, if only KVM had a framework that was specifically designed
> > >> > for context switching MSRs on return to userspace.  Translation:
> > >> > please use the
> > >> > user_return_msr() APIs.
> > >>
> > >> IIUC the user return MSR framework works for MSRs that are per CPU
> > >> constants, but like MSR_KERNEL_GS_BASE, MSR_IA32_FRED_RSP0 is a per
> > >> *task* constant, thus we can't use it.
> > >
> > >Ah, in that case, the changelog is very misleading and needs to be fixed.
> > >Alternatively, is the desired RSP0 value tracked anywhere other than the MSR?
> > >E.g. if it's somewhere in task_struct, then kvm_on_user_return()
> > >would restore the current task's desired RSP0.  Even if we don't get
> > >fancy, avoiding the RDMSR to get the current task's value would be nice.
> >
> > Hm, perhaps the right thing to do is to always invoke this function
> > before a context switch happens if that happens before return to user space?
> 
> Actually, if the _TIF_NEED_RSP0_LOAD doesn't provide a meaningful benefit (or
> y'all just don't want it :-) ), 

We want it 😊.

My concern was adding an extra check of (ti_work & _TIF_NEED_RSP0_LOAD)
into a hot function arch_exit_to_user_mode_prepare().  HPA checked the
function and suggested to test ti_work for zero and then process
individual bits in it:

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index fb2809b20b0a..4c78b99060b5 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -47,15 +47,17 @@ static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs)
 static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
                                                  unsigned long ti_work)
 {
-       if (ti_work & _TIF_USER_RETURN_NOTIFY)
-               fire_user_return_notifiers();
+       if (ti_work) {
+               if (ti_work & _TIF_USER_RETURN_NOTIFY)
+                       fire_user_return_notifiers();

-       if (unlikely(ti_work & _TIF_IO_BITMAP))
-               tss_update_io_bitmap();
+               if (unlikely(ti_work & _TIF_IO_BITMAP))
+                       tss_update_io_bitmap();

-       fpregs_assert_state_consistent();
-       if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))
-               switch_fpu_return();
+               fpregs_assert_state_consistent();
+               if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))
+                       switch_fpu_return();
+       }

 #ifdef CONFIG_COMPAT
        /*

Based on it, I measured how many 0s are out of every one million ti_work
values in kernel build tests, it's over 99%, i.e., unlikely(ti_work).

When booting a KVM guest, it becomes 75%, which is expected.  After the
guest is up running kernel build in it, it's 99% again.

So at least this patch seems a low-hanging fruit, and I have sent it to
Intel 0day for broader perf tests.

As context switches are way less frequent than exit to user mode, I do
NOT expect it makes a difference to write MSR_IA32_FRED_RSP0 on exit to
user mode instead of on context switch especially when we do it on top
of the above patch.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ