linux-kernel - Re: [PATCH v5] KVM: x86/tsc: Don't sync TSC on the first write in state restoration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <6E4A54F1-B8C0-44AD-B2A9-6EDF7059D0EC@infradead.org>
Date:   Wed, 13 Sep 2023 11:51:46 +0200
From:   David Woodhouse <dwmw2@...radead.org>
To:     Like Xu <like.xu.linux@...il.com>
CC:     Paolo Bonzini <pbonzini@...hat.com>,
        Oliver Upton <oliver.upton@...ux.dev>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Sean Christopherson <seanjc@...gle.com>
Subject: Re: [PATCH v5] KVM: x86/tsc: Don't sync TSC on the first write in state restoration



On 13 September 2023 11:43:56 CEST, Like Xu <like.xu.linux@...il.com> wrote:

>> Why? Can't we treat an explicit zero write just the same as when the kernel does it?
>
>Not sure if it meets your simplified expectations:

Think that looks good, thanks. One minor nit...


>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 6c9c81e82e65..0f05cf90d636 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -2735,20 +2735,35 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
> 			 * kvm_clock stable after CPU hotplug
> 			 */
> 			synchronizing = true;
>-		} else {
>+		} else if (!data || kvm->arch.user_set_tsc) {

If data is zero here, won't the first if() case have been taken, and set synchronizing=true?

So this is equivalent to "else if (kvm->arch.user_set_tsc)". (Which is fine and what what I intended).

> 			u64 tsc_exp = kvm->arch.last_tsc_write +
> 						nsec_to_cycles(vcpu, elapsed);
> 			u64 tsc_hz = vcpu->arch.virtual_tsc_khz * 1000LL;
> 			/*
>-			 * Special case: TSC write with a small delta (1 second)
>-			 * of virtual cycle time against real time is
>-			 * interpreted as an attempt to synchronize the CPU.
>+			 * Here lies UAPI baggage: when a user-initiated TSC write has
>+			 * a small delta (1 second) of virtual cycle time against the
>+			 * previously set vCPU, we assume that they were intended to be
>+			 * in sync and the delta was only due to the racy nature of the
>+			 * legacy API.
>+			 *
>+			 * This trick falls down when restoring a guest which genuinely
>+			 * has been running for less time than the 1 second of imprecision
>+			 * which we allow for in the legacy API. In this case, the first
>+			 * value written by userspace (on any vCPU) should not be subject
>+			 * to this 'correction' to make it sync up with values that only
>+			 * from from the kernel's default vCPU creation. Make the 1-second
>+			 * slop hack only trigger if flag is already set.
>+			 *
>+			 * The correct answer is for the VMM not to use the legacy API.
> 			 */
> 			synchronizing = data < tsc_exp + tsc_hz &&
> 					data + tsc_hz > tsc_exp;
> 		}
> 	}
>
>+	if (data)
>+		kvm->arch.user_set_tsc = true;
>+
> 	/*
> 	 * For a reliable TSC, we can match TSC offsets, and for an unstable
> 	 * TSC, we add elapsed time in this computation.  We could let the
>@@ -5536,6 +5551,7 @@ static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu,
> 		tsc = kvm_scale_tsc(rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset;
> 		ns = get_kvmclock_base_ns();
>
>+		kvm->arch.user_set_tsc = true;
> 		__kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched);
> 		raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>
>