lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 29 Apr 2024 13:28:02 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Reinette Chatre <reinette.chatre@...el.com>, isaku.yamahata@...el.com,
 pbonzini@...hat.com, erdemaktas@...gle.com, vkuznets@...hat.com,
 seanjc@...gle.com, vannapurve@...gle.com, jmattson@...gle.com,
 mlevitsk@...hat.com, chao.gao@...el.com, rick.p.edgecombe@...el.com
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V5 3/4] KVM: x86: Add a capability to configure bus
 frequency for APIC timer

On 4/26/2024 6:07 AM, Reinette Chatre wrote:
> From: Isaku Yamahata <isaku.yamahata@...el.com>
> 
> Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.
> 
> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
> frequency but ends up getting interrupts at a significantly higher rate.
> 
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
> 
> In addition, per Intel SDM:
>      "The APIC timer frequency will be the processor’s bus clock or core
>       crystal clock frequency (when TSC/core crystal clock ratio is
>       enumerated in CPUID leaf 0x15) divided by the value specified in
>       the divide configuration register."
> 
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
> 
> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
> the guest kernel measures the frequency based on other known timers like
> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
> kernel expects a 25MHz timer frequency but gets timer interrupt more
> frequently due to the 1GHz frequency used by KVM.
> 
> To ensure that the guest doesn't have a conflicting view of the APIC bus
> frequency, allow the userspace to tell KVM to use the same frequency that
> TDX mandates instead of the default 1Ghz.
> 
> There are several options to address this:
> 1. Make the KVM able to configure APIC bus frequency (this series).
>     Pro: It resembles the existing hardware.  The recent Intel CPUs
>          adapts 25MHz.
>     Con: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
>     frequency or not enumerate it.
>     Pro: Any APIC bus frequency is allowed.
>     Con: Deviates from TDX architecture.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>     Con: The kernel ignores CPUID leaf 0x15.
> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
>     as 1 GHz.
>     Pro: This has been the virtual APIC frequency for KVM guests for 13
>          years.
>     Pro: This requires changing only one hard-coded constant in TDX.
>     Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
>     Con: Core crystal clock frequency is also used to calculate TSC
>          frequency.
>     Con: If it is configured to value different from hardware, it will
>          break the correctness of INTEL-PT Mini Time Count (MTC) packets
>          in TDs.
> 
> Reported-by: Vishal Annapurve <vannapurve@...gle.com>
> Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> Co-developed-by: Reinette Chatre <reinette.chatre@...el.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@...el.com>

> ---
> Changes v5:
> - Rename capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
>    KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
> - Add Rick's Reviewed-by tag.
> 
> Changes v4:
> - Rework implementation following Sean's guidance in:
>    https://lore.kernel.org/all/ZdjzIgS6EAeCsUue@google.com/
> - Reword con #2 to acknowledge feedback. (Sean)
> - Add the "con" information from Xiaoyao during earlier review of v2.
> - Rework changelog to address comments related to "bus clock" vs
>    "core crystal clock" frequency. (Xiaoyao)
> - Drop snippet about impact on TSC deadline timer emulation. (Maxim)
> - Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
>    since tag received.
> - Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"
> 
> Changes v3:
> - Added reviewed-by Maxim Levitsky.
> - Minor update of the commit message.
> 
> Changes v2:
> - Add check if vcpu isn't created.
> - Add check if lapic chip is in-kernel emulation.
> - Fix build error for i386.
> - Add document to api.rst.
> - Typo in the commit message.
> 
>   Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
>   arch/x86/kvm/x86.c             | 27 +++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h       |  1 +
>   3 files changed, 45 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f014cd9b2217 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8063,6 +8063,23 @@ error/annotated fault.
>   
>   See KVM_EXIT_MEMORY_FAULT for more information.
>   
> +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
> +-----------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
> +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
> +          frequency or if any vCPUs have been created, -ENXIO if a virtual
> +          local APIC has not been created using KVM_CREATE_IRQCHIP.
> +
> +This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
> +virtual APIC when emulating APIC timers.  KVM's default value can be retrieved
> +by KVM_CHECK_EXTENSION.
> +
> +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
> +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
> +
>   8. Other capabilities.
>   ======================
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 10e6315103f4..fa6954c9a9d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4715,6 +4715,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_MEMORY_FAULT_INFO:
>   		r = 1;
>   		break;
> +	case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
> +		r = APIC_BUS_CYCLE_NS_DEFAULT;
> +		break;
>   	case KVM_CAP_EXIT_HYPERCALL:
>   		r = KVM_EXIT_HYPERCALL_VALID_MASK;
>   		break;
> @@ -6755,6 +6758,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		}
>   		mutex_unlock(&kvm->lock);
>   		break;
> +	case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
> +		u64 bus_cycle_ns = cap->args[0];
> +		u64 unused;
> +
> +		r = -EINVAL;
> +		/*
> +		 * Guard against overflow in tmict_to_ns(). 128 is the highest
> +		 * divide value that can be programmed in APIC_TDCR.
> +		 */
> +		if (!bus_cycle_ns ||
> +		    check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
> +			break;
> +
> +		r = 0;
> +		mutex_lock(&kvm->lock);
> +		if (!irqchip_in_kernel(kvm))
> +			r = -ENXIO;
> +		else if (kvm->created_vcpus)
> +			r = -EINVAL;
> +		else
> +			kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
>   	default:
>   		r = -EINVAL;
>   		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..6a4d9432ab11 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
>   #define KVM_CAP_MEMORY_ATTRIBUTES 233
>   #define KVM_CAP_GUEST_MEMFD 234
>   #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
>   
>   struct kvm_irq_routing_irqchip {
>   	__u32 irqchip;


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ