linux-kernel - Re: [PATCH 2/2] KVM: x86: Don't allow tsc_offset, tsc_scaling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <510e0391-410b-40be-b556-f91554b8d3a1@redhat.com>
Date: Wed, 12 Mar 2025 13:24:32 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Isaku Yamahata <isaku.yamahata@...el.com>, kvm@...r.kernel.org
Cc: Sean Christopherson <seanjc@...gle.com>, chao.gao@...el.com,
 rick.p.edgecombe@...el.com, yan.y.zhao@...el.com,
 linux-kernel@...r.kernel.org, isaku.yamahata@...il.com,
 Nikunj A Dadhania <nikunj@....com>, Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [PATCH 2/2] KVM: x86: Don't allow tsc_offset, tsc_scaling_ratio
 to change

On 10/12/24 09:55, Isaku Yamahata wrote:
> Add guest_tsc_protected member to struct kvm_arch_vcpu and prohibit
> changing TSC offset/multiplier when guest_tsc_protected is true.

Thanks Isaku!  To match the behavior of the SEV-SNP patches, this is
also needed, which I have added to kvm-coco-queue:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc2f14a6d8a1..ccde7c2b2248 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3919,7 +3925,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  	case MSR_IA32_TSC:
  		if (msr_info->host_initiated) {
  			kvm_synchronize_tsc(vcpu, &data);
-		} else {
+		} else if (!vcpu->arch.guest_tsc_protected) {
  			u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
  			adjust_tsc_offset_guest(vcpu, adj);
  			vcpu->arch.ia32_tsc_adjust_msr += adj;

Also, I rewrote the commit message as follows:

     Add guest_tsc_protected member to struct kvm_arch_vcpu and prohibit
     changing TSC offset/multiplier when guest_tsc_protected is true.
     
     X86 confidential computing technology defines protected guest TSC so that
     the VMM can't change the TSC offset/multiplier once vCPU is initialized.
     SEV-SNP defines Secure TSC as optional, whereas TDX mandates it.
     
     KVM has common logic on x86 that tries to guess or adjust TSC
     offset/multiplier for better guest TSC and TSC interrupt latency
     at KVM vCPU creation (kvm_arch_vcpu_postcreate()), vCPU migration
     over pCPU (kvm_arch_vcpu_load()), vCPU TSC device attributes
     (kvm_arch_tsc_set_attr()) and guest/host writing to TSC or TSC adjust MSR
     (kvm_set_msr_common()).
     
     The current x86 KVM implementation conflicts with protected TSC because the
     VMM can't change the TSC offset/multiplier.
     Because KVM emulates the TSC timer or the TSC deadline timer with the TSC
     offset/multiplier, the TSC timer interrupts is injected to the guest at the
     wrong time if the KVM TSC offset is different from what the TDX module
     determined.
     
     Originally this issue was found by cyclic test of rt-test [1] as the
     latency in TDX case is worse than VMX value + TDX SEAMCALL overhead.  It
     turned out that the KVM TSC offset is different from what the TDX module
     determines.
     
     Disable or ignore the KVM logic to change/adjust the TSC offset/multiplier
     somehow, thus keeping the KVM TSC offset/multiplier the same as the
     value of the TDX module.  Writes to MSR_IA32_TSC are also blocked as
     they amount to a change in the TSC offset.
     
     [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git

Paolo

> Reported-by: Marcelo Tosatti <mtosatti@...hat.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 1 +
>   arch/x86/kvm/x86.c              | 9 ++++++++-
>   2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 61b7e9fe5e57..112b8a4f1860 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1036,6 +1036,7 @@ struct kvm_vcpu_arch {
>   
>   	/* Protected Guests */
>   	bool guest_state_protected;
> +	bool guest_tsc_protected;
>   
>   	/*
>   	 * Set when PDPTS were loaded directly by the userspace without
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 65d871bb5b35..a6cf4422df28 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2587,6 +2587,9 @@ EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
>   
>   static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
>   {
> +	if (vcpu->arch.guest_tsc_protected)
> +		return;
> +
>   	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
>   				   vcpu->arch.l1_tsc_offset,
>   				   l1_offset);
> @@ -2650,6 +2653,9 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
>   
>   	lockdep_assert_held(&kvm->arch.tsc_write_lock);
>   
> +	if (vcpu->arch.guest_tsc_protected)
> +		return;
> +
>   	if (user_set_tsc)
>   		vcpu->kvm->arch.user_set_tsc = true;
>   
> @@ -5028,7 +5034,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   			u64 offset = kvm_compute_l1_tsc_offset(vcpu,
>   						vcpu->arch.last_guest_tsc);
>   			kvm_vcpu_write_tsc_offset(vcpu, offset);
> -			vcpu->arch.tsc_catchup = 1;
> +			if (!vcpu->arch.guest_tsc_protected)
> +				vcpu->arch.tsc_catchup = 1;
>   		}
>   
>   		if (kvm_lapic_hv_timer_in_use(vcpu))