lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 29 Mar 2017 14:11:47 +0200 From: Radim Krčmář <rkrcmar@...hat.com> To: Jim Mattson <jmattson@...gle.com> Cc: Alexander Graf <agraf@...e.de>, "Michael S. Tsirkin" <mst@...hat.com>, LKML <linux-kernel@...r.kernel.org>, "Gabriel L. Somlo" <gsomlo@...il.com>, Paolo Bonzini <pbonzini@...hat.com>, Jonathan Corbet <corbet@....net>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>, the arch/x86 maintainers <x86@...nel.org>, Joerg Roedel <joro@...tes.org>, kvm list <kvm@...r.kernel.org>, linux-doc@...r.kernel.org Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests 2017-03-28 13:35-0700, Jim Mattson: > On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@...hat.com> wrote: >> 2017-03-27 15:34+0200, Alexander Graf: >>> On 15/03/2017 22:22, Michael S. Tsirkin wrote: >>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem: >>>> unless explicitly provided with kernel command line argument >>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability, >>>> without checking CPUID. >>>> >>>> We currently emulate that as a NOP but on VMX we can do better: let >>>> guest stop the CPU until timer, IPI or memory change. CPU will be busy >>>> but that isn't any worse than a NOP emulation. >>>> >>>> Note that mwait within guests is not the same as on real hardware >>>> because halt causes an exit while mwait doesn't. For this reason it >>>> might not be a good idea to use the regular MWAIT flag in CPUID to >>>> signal this capability. Add a flag in the hypervisor leaf instead. >>> >>> So imagine we had proper MWAIT emulation capabilities based on page faults. >>> In that case, we could do something as fancy as >>> >>> Treat MWAIT as pass-through by default >>> >>> Have a per-vcpu monitor timer 10 times a second in the background that >>> checks which instruction we're in >>> >>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT, >>> if $IP was in non-mwait within that time, reset counter. >> >> Or we could reuse external interrupts for sampling. Exits trigerred by >> them would check for current instruction (probably would be best to >> limit just to timer tick) and a sufficient ratio (> 0?) of other exits >> would imply that MWAIT is not used. >> >>> Or instead maybe just reuse the adapter hlt logic? >> >> Emulated MWAIT is very similar to emulated HLT, so reusing the logic >> makes sense. We would just add new wakeup methods. >> >>> Either way, with that we should be able to get super low latency IPIs >>> running while still maintaining some sanity on systems which don't have >>> dedicated CPUs for workloads. >>> >>> And we wouldn't need guest modifications, which is a great plus. So older >>> guests (and Windows?) could benefit from mwait as well. >> >> There is no need guest modifications -- it could be exposed as standard >> MWAIT feature to the guest, with responsibilities for guest/host-impact >> on the user. >> >> I think that the page-fault based MWAIT would require paravirt if it >> should be enabled by default, because of performance concerns: >> Enabling write protection on a page needs a VM exit on all other VCPUs >> when beginning monitoring (to reload page permissions and prevent missed >> writes). >> We'd want to keep trapping writes to the page all the time because >> toggling is slow, but this could regress performance for an OS that has >> other data accessed by other VCPUs in that page. >> No current interface can tell the guest that it should reserve the whole >> page instead of what CPUID[5] says and that writes to the monitored page >> are not "cheap", but can trigger a VM exit ... > > CPUID.05H:EBX is supposed to address the false sharing issue. IIRC, > VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX > when running Mac OS X guests. Per Intel's SDM volume 3, section > 8.10.5, "To avoid false wake-ups; use the largest monitor line size to > pad the data structure used to monitor writes. Software must make sure > that beyond the data structure, no unrelated data variable exists in > the triggering area for MWAIT. A pad may be needed to avoid this > situation." Unfortunately, most operating systems do not follow this > advice. Right, EBX provides what we need to expose that the whole page is monitored, thanks! > Unfortunately, most operating systems do not follow this > advice. Yeah ... KVM could add yet another heuristic to drop MWAIT emulation and use hardware if there were many traps while the target was not MWAITING, it's getting over-complicated, though :/
Powered by blists - more mailing lists