linux-kernel - RE: [PATCH 15/19] kvm: x86: Save and restore guest XFD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN9PR11MB5276DF25E38EE7C4F4D29F288C729@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Sat, 11 Dec 2021 03:07:40 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        "Zhong, Yang" <yang.zhong@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>
CC:     "seanjc@...gle.com" <seanjc@...gle.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Zhong, Yang" <yang.zhong@...el.com>
Subject: RE: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly

> From: Thomas Gleixner <tglx@...utronix.de>
> Sent: Saturday, December 11, 2021 8:11 AM
> 
> On Tue, Dec 07 2021 at 19:03, Yang Zhong wrote:
> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> > index 5089f2e7dc22..9811dc98d550 100644
> > --- a/arch/x86/kernel/fpu/core.c
> > +++ b/arch/x86/kernel/fpu/core.c
> > @@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest
> *gfpu)
> >  	fpstate->is_guest	= true;
> >
> >  	gfpu->fpstate		= fpstate;
> > +	gfpu->xfd_err           = XFD_ERR_GUEST_DISABLED;
> 
> This wants to be part of the previous patch, which introduces the field.
> 
> >  	gfpu->user_xfeatures	= fpu_user_cfg.default_features;
> >  	gfpu->user_perm		= fpu_user_cfg.default_features;
> >  	fpu_init_guest_permissions(gfpu);
> > @@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest
> *guest_fpu, bool enter_guest)
> >  		fpu->fpstate = guest_fps;
> >  		guest_fps->in_use = true;
> >  	} else {
> > +		fpu_save_guest_xfd_err(guest_fpu);
> 
> Hmm. See below.
> 
> >  		guest_fps->in_use = false;
> >  		fpu->fpstate = fpu->__task_fpstate;
> >  		fpu->__task_fpstate = NULL;
> > @@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  		kvm_steal_time_set_preempted(vcpu);
> >  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> >
> > +	if (vcpu->preempted)
> > +		fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu);
> 
> I'm not really exited about the thought of an exception cause register
> in guest clobbered state.
> 
> Aside of that I really have to ask the question why all this is needed?
> 
> #NM in the guest is slow path, right? So why are you trying to optimize
> for it?

This is really good information. The current logic is obviously
based on the assumption that #NM is frequently triggered.

> 
> The straight forward solution to this is:
> 
>     1) Trap #NM and MSR_XFD_ERR write

and #NM vmexit handler should be called in kvm_x86_handle_exit_irqoff()
before preemption is enabled, otherwise there is still a small window
where MSR_XFD_ERR might be clobbered after preemption enable and
before #NM handler is actually called.

> 
>     2) When the guest triggers #NM is takes an VMEXIT and the host
>        does:
> 
>                 rdmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
> 
>        injects the #NM and goes on.
> 
>     3) When the guest writes to MSR_XFD_ERR it takes an VMEXIT and
>        the host does:
> 
>            vcpu->arch.guest_fpu.xfd_err = msrval;
>            wrmsrl(MSR_XFD_ERR, msrval);
> 
>       and goes back.
> 
>     4) Before entering the preemption disabled section of the VCPU loop
>        do:
> 
>            if (vcpu->arch.guest_fpu.xfd_err)
>                       wrmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
> 
>     5) Before leaving the preemption disabled section of the VCPU loop
>        do:
> 
>            if (vcpu->arch.guest_fpu.xfd_err)
>                       wrmsrl(MSR_XFD_ERR, 0);
> 
> It's really that simple and pretty much 0 overhead for the regular case.

Much cleaner.

> 
> If the guest triggers #NM with a high frequency then taking the VMEXITs
> is the least of the problems. That's not a realistic use case, really.
> 
> Hmm?
> 
> Thanks,
> 
>         tglx

Thanks
Kevin