linux-kernel - Re: [PATCH v3 4/7] KVM: x86: nSVM: support PAUSE filter threshold and count when cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8071f0f0a857b0775f1fb2d1ebd86ffc4fd9096b.camel@redhat.com>
Date:   Tue, 22 Mar 2022 00:11:23 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Jim Mattson <jmattson@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        Ingo Molnar <mingo@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Joerg Roedel <joro@...tes.org>, linux-kernel@...r.kernel.org,
        Wanpeng Li <wanpengli@...cent.com>
Subject: Re: [PATCH v3 4/7] KVM: x86: nSVM: support PAUSE filter threshold
 and count when cpu_pm=on

On Mon, 2022-03-21 at 14:59 -0700, Jim Mattson wrote:
> On Mon, Mar 21, 2022 at 2:36 PM Maxim Levitsky <mlevitsk@...hat.com> wrote:
> > On Wed, 2022-03-09 at 11:07 -0800, Jim Mattson wrote:
> > > On Wed, Mar 9, 2022 at 10:47 AM Paolo Bonzini <pbonzini@...hat.com> wrote:
> > > > On 3/9/22 19:35, Jim Mattson wrote:
> > > > > I didn't think pause filtering was virtualizable, since the value of
> > > > > the internal counter isn't exposed on VM-exit.
> > > > > 
> > > > > On bare metal, for instance, assuming the hypervisor doesn't intercept
> > > > > CPUID, the following code would quickly trigger a PAUSE #VMEXIT with
> > > > > the filter count set to 2.
> > > > > 
> > > > > 1:
> > > > > pause
> > > > > cpuid
> > > > > jmp 1
> > > > > 
> > > > > Since L0 intercepts CPUID, however, L2 will exit to L0 on each loop
> > > > > iteration, and when L0 resumes L2, the internal counter will be set to
> > > > > 2 again. L1 will never see a PAUSE #VMEXIT.
> > > > > 
> > > > > How do you handle this?
> > > > > 
> > > > 
> > > > I would expect that the same would happen on an SMI or a host interrupt.
> > > > 
> > > >         1:
> > > >         pause
> > > >         outl al, 0xb2
> > > >         jmp 1
> > > > 
> > > > In general a PAUSE vmexit will mostly benefit the VM that is pausing, so
> > > > having a partial implementation would be better than disabling it
> > > > altogether.
> > > 
> > > Indeed, the APM does say, "Certain events, including SMI, can cause
> > > the internal count to be reloaded from the VMCB." However, expanding
> > > that set of events so much that some pause loops will *never* trigger
> > > a #VMEXIT seems problematic. If the hypervisor knew that the PAUSE
> > > filter may not be triggered, it could always choose to exit on every
> > > PAUSE.
> > > 
> > > Having a partial implementation is only better than disabling it
> > > altogether if the L2 pause loop doesn't contain a hidden #VMEXIT to
> > > L0.
> > > 
> > 
> > Hi!
> > 
> > You bring up a very valid point, which I didn't think about.
> > 
> > However after thinking about this, I think that in practice,
> > this isn't a show stopper problem for exposing this feature to the guest.
> > 
> > 
> > This is what I am thinking:
> > 
> > First lets assume that the L2 is malicious. In this case no doubt
> > it can craft such a loop which will not VMexit on PAUSE.
> > But that isn't a problem - instead of this guest could have just used NOP
> > which is not possible to intercept anyway - no harm is done.
> > 
> > Now lets assume a non malicious L2:
> > 
> > 
> > First of all the problem can only happen when a VM exit is intercepted by L0,
> > and not by L1. Both above cases usually don't pass this criteria since L1 is highly
> > likely to intercept both CPUID and IO port access. It is also highly unlikely
> > to allow L2 direct access to L1's mmio ranges.
> > 
> > Overall there are very few cases of deterministic vm exit which is intercepted
> > by L0 but not L1. If that happens then L1 will not catch the PAUSE loop,
> > which is not different much from not catching it because of not suitable
> > thresholds.
> > 
> > Also note that this is an optimization only - due to count and threshold,
> > it is not guaranteed to catch all pause loops - in fact hypervisor has
> > to guess these values, and update them in attempt to catch as many such
> > loops as it can.
> > 
> > I think overall it is OK to expose that feature to the guest
> > and it should even improve performance in some cases - currently
> > at least nested KVM intercepts every PAUSE otherwise.
> 
> Can I at least request that this behavior be documented as a KVM
> virtual CPU erratum?

100%. Do you have a pointer where to document it?

Best regards,
	Maxim Levitsky

> 
> > Best regards,
> >         Maxim Levitsky
> > 
> > 
> > 
> >