linux-kernel - Re: [PATCH v3 4/7] KVM: x86: nSVM: support PAUSE filter threshold and count when cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <abe8584fa3691de1d6ae6c6617b8ea750b30fd1c.camel@redhat.com>
Date:   Mon, 21 Mar 2022 23:36:01 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Jim Mattson <jmattson@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Cc:     kvm@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Joerg Roedel <joro@...tes.org>, linux-kernel@...r.kernel.org,
        Wanpeng Li <wanpengli@...cent.com>
Subject: Re: [PATCH v3 4/7] KVM: x86: nSVM: support PAUSE filter threshold
 and count when cpu_pm=on

On Wed, 2022-03-09 at 11:07 -0800, Jim Mattson wrote:
> On Wed, Mar 9, 2022 at 10:47 AM Paolo Bonzini <pbonzini@...hat.com> wrote:
> > On 3/9/22 19:35, Jim Mattson wrote:
> > > I didn't think pause filtering was virtualizable, since the value of
> > > the internal counter isn't exposed on VM-exit.
> > > 
> > > On bare metal, for instance, assuming the hypervisor doesn't intercept
> > > CPUID, the following code would quickly trigger a PAUSE #VMEXIT with
> > > the filter count set to 2.
> > > 
> > > 1:
> > > pause
> > > cpuid
> > > jmp 1
> > > 
> > > Since L0 intercepts CPUID, however, L2 will exit to L0 on each loop
> > > iteration, and when L0 resumes L2, the internal counter will be set to
> > > 2 again. L1 will never see a PAUSE #VMEXIT.
> > > 
> > > How do you handle this?
> > > 
> > 
> > I would expect that the same would happen on an SMI or a host interrupt.
> > 
> >         1:
> >         pause
> >         outl al, 0xb2
> >         jmp 1
> > 
> > In general a PAUSE vmexit will mostly benefit the VM that is pausing, so
> > having a partial implementation would be better than disabling it
> > altogether.
> 
> Indeed, the APM does say, "Certain events, including SMI, can cause
> the internal count to be reloaded from the VMCB." However, expanding
> that set of events so much that some pause loops will *never* trigger
> a #VMEXIT seems problematic. If the hypervisor knew that the PAUSE
> filter may not be triggered, it could always choose to exit on every
> PAUSE.
> 
> Having a partial implementation is only better than disabling it
> altogether if the L2 pause loop doesn't contain a hidden #VMEXIT to
> L0.
> 

Hi!

You bring up a very valid point, which I didn't think about.

However after thinking about this, I think that in practice,
this isn't a show stopper problem for exposing this feature to the guest.

This is what I am thinking:

First lets assume that the L2 is malicious. In this case no doubt
it can craft such a loop which will not VMexit on PAUSE.
But that isn't a problem - instead of this guest could have just used NOP
which is not possible to intercept anyway - no harm is done.

Now lets assume a non malicious L2:

First of all the problem can only happen when a VM exit is intercepted by L0,
and not by L1. Both above cases usually don't pass this criteria since L1 is highly
likely to intercept both CPUID and IO port access. It is also highly unlikely
to allow L2 direct access to L1's mmio ranges.

Overall there are very few cases of deterministic vm exit which is intercepted
by L0 but not L1. If that happens then L1 will not catch the PAUSE loop,
which is not different much from not catching it because of not suitable
thresholds.

Also note that this is an optimization only - due to count and threshold,
it is not guaranteed to catch all pause loops - in fact hypervisor has
to guess these values, and update them in attempt to catch as many such
loops as it can.

I think overall it is OK to expose that feature to the guest
and it should even improve performance in some cases - currently
at least nested KVM intercepts every PAUSE otherwise.

Best regards,
	Maxim Levitsky