linux-kernel - RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB527653AF0BEC5BBA007B72408C449@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Wed, 29 Dec 2021 03:35:27 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     "Christopherson,, Sean" <seanjc@...gle.com>,
        "Liu, Jing2" <jing2.liu@...el.com>
CC:     "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "corbet@....net" <corbet@....net>,
        "shuah@...nel.org" <shuah@...nel.org>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
        "Zeng, Guang" <guang.zeng@...el.com>,
        "Wang, Wei W" <wei.w.wang@...el.com>,
        "Zhong, Yang" <yang.zhong@...el.com>
Subject: RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on
 demand

> From: Sean Christopherson <seanjc@...gle.com>
> Sent: Wednesday, December 29, 2021 9:05 AM
> 
> On Wed, Dec 22, 2021, Jing Liu wrote:
> > @@ -1968,6 +1969,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
> >  	case MSR_IA32_XFD:
> >  		ret = kvm_set_msr_common(vcpu, msr_info);
> >  		if (!ret && data) {
> > +			vmx_disable_intercept_for_msr(vcpu,
> MSR_IA32_XFD, MSR_TYPE_RW);
> > +			vcpu->arch.xfd_out_of_sync = true;
> 
> xfd_out_of_sync is a poor name, as XFD _may_ be out of sync, or it may not.
> It's
> also confusing that it's kept set after XFD is explicitly synchronized in
> vcpu_enter_guest().

yes, sync_xfd_after_exit might be more accurate.

> 
> > +
> >  			vcpu->arch.trap_nm = true;
> >  			vmx_update_exception_bitmap(vcpu);
> 
> Ah, this is why #NM interception was made sticky many patches ago.  More
> at the end.
> 
> >  		}
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index bf9d3051cd6c..0a00242a91e7 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -340,7 +340,7 @@ struct vcpu_vmx {
> >  	struct lbr_desc lbr_desc;
> >
> >  	/* Save desired MSR intercept (read: pass-through) state */
> > -#define MAX_POSSIBLE_PASSTHROUGH_MSRS	14
> > +#define MAX_POSSIBLE_PASSTHROUGH_MSRS	15
> >  	struct {
> >  		DECLARE_BITMAP(read,
> MAX_POSSIBLE_PASSTHROUGH_MSRS);
> >  		DECLARE_BITMAP(write,
> MAX_POSSIBLE_PASSTHROUGH_MSRS);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3b756ff13103..10a08aa2aa45 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10024,6 +10024,9 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >  	if (vcpu->arch.guest_fpu.xfd_err)
> >  		wrmsrl(MSR_IA32_XFD_ERR, 0);
> >
> > +	if (vcpu->arch.xfd_out_of_sync)
> 
> Rather than adding a flag that tracks whether or not the MSR can be written
> by
> the guest, can't this be:
> 
> 	if (!vmx_test_msr_bitmap_write(vcpu->loaded_vmcs->msr_bitmap))
> 		fpu_sync_guest_vmexit_xfd_state();
> 

We can use this

> That might be marginally slower than checking a dedicated flag?  But is has
> the
> advantage of doing the correct thing for nested guests instead of penalizing
> them
> with an unnecessary sync on every exit.  If performance of the check is an
> issue,
> we could add a static key to skip the code unless at least one vCPU has
> triggered
> the XFD crud, a la kvm_has_noapic_vcpu (which may or may not provide any
> real
> performance benefits).
> 
> Speaking of nested, interception of #NM in vmx_update_exception_bitmap()
> is wrong
> with respect to nested guests.  Until XFD is supported for L2, which I didn't
> see
> in this series, #NM should not be intercepted while L2 is running.

Can you remind what additional thing is required to support XFD for L2?
If only about performance I prefer to the current conservative approach
as the first step. As explained earlier, #NM should be rare if the guest 
doesn't run AMX applications at all. Adding nested into this picture doesn't 
make things a lot worser.

> 
> For the earlier patch that introduced arch.trap_nm, if it's not too gross and
> not
> racy, the code could be:
> 
> 	if (is_guest_mode(vcpu))
> 		eb |= get_vmcs12(vcpu)->exception_bitmap;
>         else {
> 		...
> 
> 		if (vcpu->arch.guest_fpu.fpstate.xfd)
> 			eb |= (1u << NM_VECTOR);
> 	}
> 
> Though I'm ok with a semi-temporary flag if that's gross/racy.
> 
> Then this patch can change it to:
> 
> 	if (is_guest_mode(vcpu))
> 		eb |= get_vmcs12(vcpu)->exception_bitmap;
>         else {
> 		...
> 
> 		if (!vmx_test_msr_bitmap_write(vcpu->vmcs01.msr_bitmap))
> 			eb |= (1u << NM_VECTOR);
> 	}
> 
> > +		fpu_sync_guest_vmexit_xfd_state();
> > +
> >  	/*
> >  	 * Consume any pending interrupts, including the possible source of
> >  	 * VM-Exit on SVM and any ticks that occur between VM-Exit and
> now.
> > --
> > 2.27.0
> >

Thanks
Kevin