lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8bdf22c8-9ef1-e526-df36-9073a150669d@redhat.com>
Date:   Wed, 25 Jan 2023 22:46:30 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Jim Mattson <jmattson@...gle.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        seanjc@...gle.com, stable@...r.kernel.org
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from
 KVM_GET_SUPPORTED_CPUID

On 1/25/23 17:47, Jim Mattson wrote:
>> Part of the definition of the API is that you can take
>> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
>> Returning host topology information for a random host vCPU definitely
>> violates the contract.
> 
> You are attempting to rewrite history.  Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
> cpuid management"), and the only documentation of the
> KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
> 
>       - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
>         supports
>
> [...] the intention was to return the
> host topology information for the current logical processor.

The handling of unknown features is so naive in that commit, that I 
don't think it is possible to read anything from the implementation; and 
it certainly should not be a paragon for a future-proof implementation 
of KVM_GET_SUPPORTED_CPUID.

For example, it only hid _known_ CPUID leaves or features and passed the 
unknown ones through, which you'll agree is completely broken.  It also 
didn't try to handle all leaves for which ECX might turn out to be 
significant---which happened for EAX=7 so the commit returns a wrong 
output for CPUID[EAX=7,ECX=0].EAX.

In other words, the only reason it handles 0xB is because it was known 
to have subleaves.

We can get more information about how userspace was intended to use it 
from the qemu-kvm fork, which at the time was practically the only KVM 
userspace.  As of 2009 it was only checking a handful of leaves:

https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?h=kvm-88#n133

so shall we say that userspace is supposed to build each CPUID leaf one 
by one and use KVM_GET_SUPPORTED_CPUID2 for validation only?  I think 
the first committed documentation agrees: "Userspace can use the 
information returned by this ioctl to construct cpuid information (for 
KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace 
capabilities, and with user requirements".

However, that's the theory.  "Do not break userspace" also involves 
looking at how userspace *really* uses the API, and make compromises to 
cater to those uses; which is different from rewriting history.

And in practice, people basically stopped reading after "(for 
KVM_SET_CPUID2)".

For example in kvmtool:

	kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES;
	if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0)
		die_perror("KVM_GET_SUPPORTED_CPUID failed");

	filter_cpuid(kvm_cpuid, vcpu->cpu_id);

	if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0)
		die_perror("KVM_SET_CPUID2 failed");

where filter_cpuid only does minor adjustments that do not include 0xB, 
0x1F and 0x8000001E.  The result is a topology that makes no sense if 
host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC 
id in EDX.

https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030fd31f/x86/cpuid.c


crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but 
it fails to adjust the APIC id in EDX.  On the other hand it also passes 
through 0x8000001E if ctx.cpu_config.host_cpu_topology is false, 
incorrectly.  So on one hand this patch breaks host_cpu_topology == 
true, on the other hand it fixes host_cpu_topology == false on AMD 
processors.

https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec82/x86_64/src/cpuid.rs#L121


The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but 
not for 0x1F.  Apart from that it passes through the host topology 
leaves, again resulting in nonsensical topology for host #vCPUs != guest 
#vCPUs.

https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c82d6277a755/src/vm-vcpu-ref/src/x86_64/cpuid.rs


Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb 
and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch 
is again a fix of sorts---having all zeroes in 0x1F is better than 
having a value that is valid but inconsistent with 0xB.

https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/intel.rs#L120
https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/amd.rs#L88


So basically the only open source userspace that is penalized (but not 
broken, and also partly fixed) by the patch is crosvm.  QEMU doesn't 
care, while firecracker/kvmtool/vmm-reference are a net positive.

Paolo

> Any future changes to either the operational behavior or the
> documented behavior of the ABI surely demand a version bump.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ