linux-kernel - Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 30 Jan 2018 13:11:25 +0100
From:   Christian Borntraeger <borntraeger@...ibm.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        David Woodhouse <dwmw2@...radead.org>
Cc:     Arjan van de Ven <arjan@...ux.intel.com>,
        Eduardo Habkost <ehabkost@...hat.com>,
        KarimAllah Ahmed <karahmed@...zon.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Ashok Raj <ashok.raj@...el.com>,
        Asit Mallick <asit.k.mallick@...el.com>,
        Borislav Petkov <bp@...e.de>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "H . Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Janakarajan Natarajan <Janakarajan.Natarajan@....com>,
        Joerg Roedel <joro@...tes.org>,
        Jun Nakajima <jun.nakajima@...el.com>,
        Laura Abbott <labbott@...hat.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        KVM list <kvm@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [RFC,05/10] x86/speculation: Add basic IBRS support
 infrastructure

On 01/30/2018 01:23 AM, Linus Torvalds wrote:
[...]
> 
> So I actually have a _different_ question to the virtualization
> people. This includes the vmware people, but it also obviously
> incldues the Amazon AWS kind of usage.
> 
> When you're a hypervisor (whether vmware or Amazon), why do you even
> end up caring about these things so much? You're protected from
> meltdown thanks to the virtual environment already having separate
> page tables.  And the "big hammer" approach to spectre would seem to
> be to just make sure the BTB and RSB are flushed at vmexit time - and
> even then you might decide that you really want to just move it to
> vmenter time, and only do it if the VM has changed since last time
> (per CPU).
> 
> Why do you even _care_ about the guest, and how it acts wrt Skylake?
> What you should care about is not so much the guests (which do their
> own thing) but protect guests from each other, no?
> 
> So I'm a bit mystified by some of this discussion within the context
> of virtual machines. I think that is separate from any measures that
> the guest machine may then decide to partake in.
> 
> If you are ever going to migrate to Skylake, I think you should just
> always tell the guests that you're running on Skylake. That way the
> guests will always assume the worst case situation wrt Specte.
> 
> Maybe that mystification comes from me missing something.

I can only speak for KVM, but I think the hypervisor issues come from
the fact that for migration purposes the hypervisor "lies" to the guest
in regard to what kind of CPU is running.  (it has to lie, see below).

This is to avoid random guest crashes by not announcing features. For
example if you want to migrate forth and back between a system that
has AVX512 and another one that has not you must tell the guest that
AVX512 is not available - even if it runs on the capable system.

To protect against new features the hypervisor only announces features
that it understands.
So you essentially start a VM in QEMU of a given CPU type that is
constructed of a base cpu type plus extra features. Before migration, 
it is checked if  he target system can run a guest of given type - 
otherwise migration is rejected. 

The management stack also knows things like baselining - basically
creating the best possible guest CPU given a set of hosts.

The problem now is: If you have lets say Broadwell and Skylakes.
What kind of CPU type are you telling your guest? If you claim
broadwell but run on skylake then you prevent that the guest can 
protect itself, because the guest does not know that it should do 
something special. If you say skylake the guest might start using
features that broadwell does not understand.

So I think what we have here is that the current (guest) cpu model
for hypervisors was always designed for architectural features.
Presenting a microarchitectural knowledge for workarounds does
not seem to be well integrated into hypervisors.

PS: For a list of potential cpus/features look at
https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/cpu/cpu_map.xml