linux-kernel - Re: [RFC PATCH] KVM: x86: Disallow KVM_SET

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191218201002.GE25201@linux.intel.com>
Date:   Wed, 18 Dec 2019 12:10:02 -0800
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Jim Mattson <jmattson@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Joerg Roedel <joro@...tes.org>, kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Weijiang Yang <weijiang.yang@...el.com>
Subject: Re: [RFC PATCH] KVM: x86: Disallow KVM_SET_CPUID{2} if the vCPU is
 in guest mode

On Wed, Dec 18, 2019 at 11:38:43AM -0800, Jim Mattson wrote:
> On Wed, Dec 18, 2019 at 9:42 AM Sean Christopherson
> <sean.j.christopherson@...el.com> wrote:
> >
> > Reject KVM_SET_CPUID{2} with -EBUSY if the vCPU is in guest mode (L2) to
> > avoid complications and potentially undesirable KVM behavior.  Allowing
> > userspace to change a guest's capabilities while L2 is active would at
> > best result in unexpected behavior in the guest (L1 or L2), and at worst
> > induce bad KVM behavior by breaking fundamental assumptions regarding
> > transitions between L0, L1 and L2.
> 
> This seems a bit contrived. As long as we're breaking the ABI, can we
> disallow changes to CPUID once the vCPU has been powered on?

I can at least concoct scenarios where changing CPUID after KVM_RUN
provides value, e.g. effectively creating a new VM/vCPU without destroying
the kernel's underlying data structures and without putting the file
descriptors, for performance (especially if KVM avoids its hardware on/off
paths) or sandboxing (process has access to a VM fd, but not /dev/kvm).

A truly contrived, but technically architecturally accurate, scenario would
be modeling SGX interaction with the machine check architecutre.  Per the
SDM, #MCs or clearing bits in IA32_MCi_CTL disable SGX, which is reflected
in CPUID:

  Any machine check exception (#MC) that occurs after Intel SGX is first
  enables causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0)
  It cannot be enabled until after the next reset.

  Any act of clearing bits from '1 to '0 in any of the IA32_MCi_CTL register
  may disable Intel SGX (set CPUID.SGX_Leaf.0:EAX[SGX1] to 0) until the next
  reset.

I doubt a userspace VMM would actively model that behavior, but it's at
least theoretically possible.  Yes, it would technically be possible for
SGX to be disabled while L2 is active, but I don't think it's unreasonable
to require userspace to first force the vCPU out of L2.