linux-kernel - Re: [PATCH v2 02/11] KVM: nSVM: clean up the copying of V

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y/jqUgR6zTcptcxw@google.com>
Date:   Fri, 24 Feb 2023 08:48:18 -0800
From:   Sean Christopherson <seanjc@...gle.com>
To:     Maxim Levitsky <mlevitsk@...hat.com>
Cc:     kvm@...r.kernel.org, Sandipan Das <sandipan.das@....com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Borislav Petkov <bp@...en8.de>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Daniel Sneddon <daniel.sneddon@...ux.intel.com>,
        Jiaxi Chen <jiaxi.chen@...ux.intel.com>,
        Babu Moger <babu.moger@....com>, linux-kernel@...r.kernel.org,
        Jing Liu <jing2.liu@...el.com>,
        Wyes Karny <wyes.karny@....com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits
 from vmcb02 to vmcb12

On Fri, Feb 24, 2023, Maxim Levitsky wrote:
> On Tue, 2023-01-31 at 01:44 +0000, Sean Christopherson wrote:
> > On Sat, Jan 28, 2023, Sean Christopherson wrote:
> > So I think this over two patches?
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 05d38944a6c0..ad1e70ac8669 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
> >  
> >         if (g->int_ctl & V_INTR_MASKING_MASK) {
> >                 /*
> > -                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
> > -                * does not affect any interrupt we may want to inject;
> > -                * therefore, writes to CR8 are irrelevant to L0, as are
> > -                * interrupt window vmexits.
> > +                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
> > +                * disable intercept of CR8 writes as L2's CR8 does not affect
> > +                * any interrupt KVM may want to inject.
> > +                *
> > +                * Similarly, disable intercept of virtual interrupts (used to
> > +                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
> > +                * the effective RFLAGS.IF for L1 interrupts will never be set
> > +                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
> >                  */
> >                 vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
> > -               vmcb_clr_intercept(c, INTERCEPT_VINTR);
> > +               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
> > +                       vmcb_clr_intercept(c, INTERCEPT_VINTR);
> 
> How about instead moving this code to svm_set_vintr?

I considered that, but it doesn't handle the case where INTERCEPT_VINTR was set
in vmcb01 before nested VMRUN, i.e. KVM is waiting for an interrupt window at
the time of L1, and L1 doesn't set RFLAGS.IF=1 prior to VMRUN.

> That is, in the guest mode, if the guest has V_INTR_MASKING_MASK, then
> then a nested VM exit is the next point the interrupt window could open,
> thus we don't set VINTR)
> 
> Or even better put the logic in svm_enable_irq_window (that is avoid
> calling svm_set_vintr in the first place).
> 
> I also think that it worth it to add a warning that 'svm_set_intercept'
> didn't work, that is didn't really set an intercept.

Heh, I had coded that up too, but switched to bailing from svm_set_vintr() if the
intercept was disabled because of the aforementioned scenario.

> In theory that can result in nasty CVEs in addition to logic bugs as you found.

I don't think this can result in a CVE, at least not without even worse bugs in
L1.  KVM uses INTERCEPT_VINTR purely to detect interrupt windows, and failure to
configure an IRQ window would at worst cause KVM to delay an IRQ.  If a missing
or late IRQ lets L2 extract data from L1, then L1 has problems of its own.

> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index b103fe7cbc82..59d2891662ef 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
> >  
> >         svm_set_intercept(svm, INTERCEPT_VINTR);
> >  
> > +       /*
> > +        * Recalculating intercepts may have clear the VINTR intercept.  If
> > +        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
> > +        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
> > +        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
> > +        * interrupts will never be unblocked while L2 is running.
> > +        */
> > +       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
> > +               return;
> 
> This won't be needed if we don't call the svm_set_vintr in the first place.
> 
> > +
> >         /*
> >          * This is just a dummy VINTR to actually cause a vmexit to happen.
> >          * Actual injection of virtual interrupts happens through EVENTINJ.
> > 
> 
> 
> 
> With all this said, I also want to note that this patch has *nothing* to do with VNMI,
> I only added it due to some refactoring, so feel free to drop it from vNMI queue,
> and deal with those bugs separately.

Ya, let's tackle this in a separate series.  I'll circle back to this after rc1
(I'm OOO next week).

Santosh, in the next version of the vNMI series, can you drop any patches that
aren't strictly necessary to enable vNMI?