linux-kernel - Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f6607513-4cbd-3fa0-1663-5477e855e783@suse.de>
Date:   Mon, 3 Apr 2017 12:04:34 +0200
From:   Alexander Graf <agraf@...e.de>
To:     Radim Krčmář <rkrcmar@...hat.com>,
        Jim Mattson <jmattson@...gle.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "Gabriel L. Somlo" <gsomlo@...il.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        Joerg Roedel <joro@...tes.org>, kvm list <kvm@...r.kernel.org>,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests

On 03/29/2017 02:11 PM, Radim Krčmář wrote:
> 2017-03-28 13:35-0700, Jim Mattson:
>> On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@...hat.com> wrote:
>>> 2017-03-27 15:34+0200, Alexander Graf:
>>>> On 15/03/2017 22:22, Michael S. Tsirkin wrote:
>>>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
>>>>> unless explicitly provided with kernel command line argument
>>>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
>>>>> without checking CPUID.
>>>>>
>>>>> We currently emulate that as a NOP but on VMX we can do better: let
>>>>> guest stop the CPU until timer, IPI or memory change.  CPU will be busy
>>>>> but that isn't any worse than a NOP emulation.
>>>>>
>>>>> Note that mwait within guests is not the same as on real hardware
>>>>> because halt causes an exit while mwait doesn't.  For this reason it
>>>>> might not be a good idea to use the regular MWAIT flag in CPUID to
>>>>> signal this capability.  Add a flag in the hypervisor leaf instead.
>>>> So imagine we had proper MWAIT emulation capabilities based on page faults.
>>>> In that case, we could do something as fancy as
>>>>
>>>> Treat MWAIT as pass-through by default
>>>>
>>>> Have a per-vcpu monitor timer 10 times a second in the background that
>>>> checks which instruction we're in
>>>>
>>>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT,
>>>> if $IP was in non-mwait within that time, reset counter.
>>> Or we could reuse external interrupts for sampling.  Exits trigerred by
>>> them would check for current instruction (probably would be best to
>>> limit just to timer tick) and a sufficient ratio (> 0?) of other exits
>>> would imply that MWAIT is not used.
>>>
>>>> Or instead maybe just reuse the adapter hlt logic?
>>> Emulated MWAIT is very similar to emulated HLT, so reusing the logic
>>> makes sense.  We would just add new wakeup methods.
>>>
>>>> Either way, with that we should be able to get super low latency IPIs
>>>> running while still maintaining some sanity on systems which don't have
>>>> dedicated CPUs for workloads.
>>>>
>>>> And we wouldn't need guest modifications, which is a great plus. So older
>>>> guests (and Windows?) could benefit from mwait as well.
>>> There is no need guest modifications -- it could be exposed as standard
>>> MWAIT feature to the guest, with responsibilities for guest/host-impact
>>> on the user.
>>>
>>> I think that the page-fault based MWAIT would require paravirt if it
>>> should be enabled by default, because of performance concerns:
>>> Enabling write protection on a page needs a VM exit on all other VCPUs
>>> when beginning monitoring (to reload page permissions and prevent missed
>>> writes).
>>> We'd want to keep trapping writes to the page all the time because
>>> toggling is slow, but this could regress performance for an OS that has
>>> other data accessed by other VCPUs in that page.
>>> No current interface can tell the guest that it should reserve the whole
>>> page instead of what CPUID[5] says and that writes to the monitored page
>>> are not "cheap", but can trigger a VM exit ...
>> CPUID.05H:EBX is supposed to address the false sharing issue. IIRC,
>> VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX
>> when running Mac OS X guests. Per Intel's SDM volume 3, section
>> 8.10.5, "To avoid false wake-ups; use the largest monitor line size to
>> pad the data structure used to monitor writes. Software must make sure
>> that beyond the data structure, no unrelated data variable exists in
>> the triggering area for MWAIT. A pad may be needed to avoid this
>> situation." Unfortunately, most operating systems do not follow this
>> advice.
> Right, EBX provides what we need to expose that the whole page is
> monitored, thanks!

So coming back to the original patch, is there anything that should keep 
us from exposing MWAIT straight into the guest at all times?


Alex