lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 30 Dec 2021 01:28:01 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     "Christopherson,, Sean" <seanjc@...gle.com>
CC:     "Liu, Jing2" <jing2.liu@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "corbet@....net" <corbet@....net>,
        "shuah@...nel.org" <shuah@...nel.org>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        "jing2.liu@...ux.intel.com" <jing2.liu@...ux.intel.com>,
        "Zeng, Guang" <guang.zeng@...el.com>,
        "Wang, Wei W" <wei.w.wang@...el.com>,
        "Zhong, Yang" <yang.zhong@...el.com>
Subject: RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on
 demand

> From: Sean Christopherson <seanjc@...gle.com>
> Sent: Thursday, December 30, 2021 1:26 AM
> 
> On Wed, Dec 29, 2021, Tian, Kevin wrote:
> > > From: Tian, Kevin
> > > Sent: Wednesday, December 29, 2021 11:35 AM
> > > >
> > > > Speaking of nested, interception of #NM in
> > > vmx_update_exception_bitmap()
> > > > is wrong
> > > > with respect to nested guests.  Until XFD is supported for L2, which I
> didn't
> > > > see
> > > > in this series, #NM should not be intercepted while L2 is running.
> > >
> > > Can you remind what additional thing is required to support XFD for L2?
> >
> > ok, at least we require additional work on when to disable write
> interception.
> > It can be done only when both L0 and L1 make the same decision, just like
> > what we did for many other VMCS settings...
> 
> I'm not terribly concerned with exposing XFD to L2 right now, I'm much more
> concerned with not breaking nVMX when XFD is exposed to L1.
> 
> > > If only about performance I prefer to the current conservative approach
> > > as the first step. As explained earlier, #NM should be rare if the guest
> > > doesn't run AMX applications at all. Adding nested into this picture
> doesn't
> > > make things a lot worser.
> >
> > All your comments incorporated except this one. As said always trapping
> > #NM for L2 is not a big problem, as long as it's rare and don't break
> function.
> > Usually a relatively static scheme is safer than a dynamic one in case of
> > anything overlooked for the initial implementation. 😊
> 
> I strongly disagree, it's not automagically safer.  Either way, we need to
> validate
> that KVM does the correct thing with respect to vmcs12.  E.g. does KVM
> correctly
> reflect the #NM back to L2 when it's not intercepted by L1?  Does it
> synthesize a
> nested VM-Exit when it is intercepted by L1?  The testing required doesn't
> magically
> go away.
> 
> As posted, there is zero chance that the patches correctly handling #NM in L2
> because nested_vmx_l0_wants_exit() doesn't prevent an #NM from being
> forwarded
> to L1.  E.g. if L0 (KVM) and L1 both intercept #NM, handle_exception_nm()
> will
> queue a #NM for injection and then syntehsizea nested VM-Exit, i.e. the #NM
> from
> L2 will get injected into L1 after the nested exit.

Yes, it's incorrect behavior.

> 
> That also means handle_exception_nmi_irqoff() => handle_exception_nm()
> is fatally
> broken on non-XFD hardware, as it will attempt RDMSR(MSR_IA32_XFD_ERR)
> if L1
> intercepts #NM since handle_exception_nmi_irqoff() will run before
> __vmx_handle_exit() => nested_vmx_reflect_vmexit() checks whether L0 or
> L1 should
> handle the exit.

Ditto. Thanks for pointing out those facts that we obviously overlooked.

So handle_exception_nm() should neither blindly read xfd_err (#NM might be 
caused by L1 interception on a non-xfd platform) nor blindly queue a #NM 
exception (triggered in L2) to L1 which intercepts #NM (then expects vm-exit)

The former suggests that reading xfd_err should be made conditionally
similar to what we did in vmx_update_exception_bitmap(). The latter
suggests the actual exception is better postponed until __vmx_handle_exit().

We are looking at this change now.

And once #NM handler works correctly to handle interception by either L0
or L1, I'm not sure whether we still want to special case L1 vs. L2 in 
vmx_update_exception_bitmap(), since in the end L0 wants to intercept 
#NM to save xfd_err for both L1 and L2 (if exposed with xfd capability by L1).

Thanks
Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ