lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5bdb92ab83269b49ad8fbbe8f54df01f6b98ea8f.camel@infradead.org>
Date: Fri, 28 Feb 2025 11:23:41 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>, Thomas Gleixner
 <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov
 <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>, Paolo Bonzini
 <pbonzini@...hat.com>, Juergen Gross <jgross@...e.com>,  "K. Y. Srinivasan"
 <kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu
 <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, Ajay Kaher
 <ajay.kaher@...adcom.com>, Jan Kiszka <jan.kiszka@...mens.com>, Andy
 Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>, Daniel
 Lezcano <daniel.lezcano@...aro.org>, John Stultz <jstultz@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev, 
	kvm@...r.kernel.org, virtualization@...ts.linux.dev, 
	linux-hyperv@...r.kernel.org, xen-devel@...ts.xenproject.org, Tom Lendacky
	 <thomas.lendacky@....com>, Nikunj A Dadhania <nikunj@....com>
Subject: Re: [PATCH v2 00/38] x86: Try to wrangle PV clocks vs. TSC

On Wed, 2025-02-26 at 18:18 -0800, Sean Christopherson wrote:
> This... snowballed a bit.
> 
> The bulk of the changes are in kvmclock and TSC, but pretty much every
> hypervisor's guest-side code gets touched at some point.  I am reaonsably
> confident in the correctness of the KVM changes.  For all other hypervisors,
> assume it's completely broken until proven otherwise.
>
> Note, I deliberately omitted:
> 
>   Alexey Makhalov <alexey.amakhalov@...adcom.com>
>   jailhouse-dev@...glegroups.com
> 
> from the To/Cc, as those emails bounced on the last version, and I have zero
> desire to get 38*2 emails telling me an email couldn't be delivered.
> 
> The primary goal of this series is (or at least was, when I started) to
> fix flaws with SNP and TDX guests where a PV clock provided by the untrusted
> hypervisor is used instead of the secure/trusted TSC that is controlled by
> trusted firmware.
> 
> The secondary goal is to draft off of the SNP and TDX changes to slightly
> modernize running under KVM.  Currently, KVM guests will use TSC for
> clocksource, but not sched_clock.  And they ignore Intel's CPUID-based TSC
> and CPU frequency enumeration, even when using the TSC instead of kvmclock.
> And if the host provides the core crystal frequency in CPUID.0x15, then KVM
> guests can use that for the APIC timer period instead of manually calibrating
> the frequency.
> 
> Lots more background on the SNP/TDX motiviation:
> https://lore.kernel.org/all/20250106124633.1418972-13-nikunj@amd.com

Looks good; thanks for tackling this.

I think there are still some things from my older series at
https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org/
which this doesn't address. Specifically, the accuracy and consistency
of what KVM advertises to the guest as the KVM clock. And as the Xen
clock, more to the point — because guests generally *know* that the KVM
clock is awful, but expect better of the Xen clock.

With a sane and consistent TSC, the mul/shift factors that KVM presents
to the guest in the kvmclock structure should basically *never* change.
Not even on live update (or live migration between hosts with the same
host TSC frequency). 

Take live update as the simple case: serializing the QEMU state and
restarting it immediately, just to update QEMU with the guest
experiencing only a few milliseconds of steal time.

The guest TSC has a fixed arithmetic relationship to the host TSC. That
should *not* change across the live update; not by a single count. 
I don't believe the KVM APIs allow userspace to get that right, which
is resolved by the KVM_VCPU_TSC_SCALE ioctl in patch 7 of that series:
https://lore.kernel.org/all/20240522001817.619072-8-dwmw2@infradead.org/

And then the KVM clock should have a fixed arithmetic relationship to
the guest TSC, which should *also* not change. Not even over live
migration — userspace should ensure the guest TSC is as accurate as
possible given NTP synchronisation between the hosts, and then the KVM
clock remains a fixed function of the guest TSC (at least, if the guest
TSC is the same frequency on source and destination). The existing KVM
API doesn't allow userspace to get *that* right either, which is
addressed by Jack's patch 3 of the series:
https://lore.kernel.org/all/20240522001817.619072-4-dwmw2@infradead.org/

The rest of the series is mostly fixing a bunch of places where KVM
gratuitously recalculates the KVM clock that it advertises to the
guest, and the fact that it does so *badly* in some cases, with a loss
of precision that causes errors in the guest. You may already have
addressed some of those; I'll go over my series and see what still
applies on top of yours.

> 
> v2:
>  - Add struct to hold the TSC CPUID output. [Boris]
>  - Don't pointlessly inline the TSC CPUID helpers. [Boris]
>  - Fix a variable goof in a helper, hopefully for real this time. [Dan]
>  - Collect reviews. [Nikunj]
>  - Override the sched_clock save/restore hooks if and only if a PV clock
>    is successfully registered.
>  - During resome, restore clocksources before reading persistent time.
>  - Clean up more warts created by kvmclock.
>  - Fix more bugs in kvmclock's suspend/resume handling.
>  - Try to harden kvmclock against future bugs.
> 
> v1: https://lore.kernel.org/all/20250201021718.699411-1-seanjc@google.com
> 
> Sean Christopherson (38):
>   x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
>   x86/tsc: Add standalone helper for getting CPU frequency from CPUID
>   x86/tsc: Add helper to register CPU and TSC freq calibration routines
>   x86/sev: Mark TSC as reliable when configuring Secure TSC
>   x86/sev: Move check for SNP Secure TSC support to tsc_early_init()
>   x86/tdx: Override PV calibration routines with CPUID-based calibration
>   x86/acrn: Mark TSC frequency as known when using ACRN for calibration
>   clocksource: hyper-v: Register sched_clock save/restore iff it's
>     necessary
>   clocksource: hyper-v: Drop wrappers to sched_clock save/restore
>     helpers
>   clocksource: hyper-v: Don't save/restore TSC offset when using HV
>     sched_clock
>   x86/kvmclock: Setup kvmclock for secondary CPUs iff CONFIG_SMP=y
>   x86/kvm: Don't disable kvmclock on BSP in syscore_suspend()
>   x86/paravirt: Move handling of unstable PV clocks into
>     paravirt_set_sched_clock()
>   x86/kvmclock: Move sched_clock save/restore helpers up in kvmclock.c
>   x86/xen/time: Nullify x86_platform's sched_clock save/restore hooks
>   x86/vmware: Nullify save/restore hooks when using VMware's sched_clock
>   x86/tsc: WARN if TSC sched_clock save/restore used with PV sched_clock
>   x86/paravirt: Pass sched_clock save/restore helpers during
>     registration
>   x86/kvmclock: Move kvm_sched_clock_init() down in kvmclock.c
>   x86/xen/time: Mark xen_setup_vsyscall_time_info() as __init
>   x86/pvclock: Mark setup helpers and related various as
>     __init/__ro_after_init
>   x86/pvclock: WARN if pvclock's valid_flags are overwritten
>   x86/kvmclock: Refactor handling of PVCLOCK_TSC_STABLE_BIT during
>     kvmclock_init()
>   timekeeping: Resume clocksources before reading persistent clock
>   x86/kvmclock: Hook clocksource.suspend/resume when kvmclock isn't
>     sched_clock
>   x86/kvmclock: WARN if wall clock is read while kvmclock is suspended
>   x86/kvmclock: Enable kvmclock on APs during onlining if kvmclock isn't
>     sched_clock
>   x86/paravirt: Mark __paravirt_set_sched_clock() as __init
>   x86/paravirt: Plumb a return code into __paravirt_set_sched_clock()
>   x86/paravirt: Don't use a PV sched_clock in CoCo guests with trusted
>     TSC
>   x86/tsc: Pass KNOWN_FREQ and RELIABLE as params to registration
>   x86/tsc: Rejects attempts to override TSC calibration with lesser
>     routine
>   x86/kvmclock: Mark TSC as reliable when it's constant and nonstop
>   x86/kvmclock: Get CPU base frequency from CPUID when it's available
>   x86/kvmclock: Get TSC frequency from CPUID when its available
>   x86/kvmclock: Stuff local APIC bus period when core crystal freq comes
>     from CPUID
>   x86/kvmclock: Use TSC for sched_clock if it's constant and non-stop
>   x86/paravirt: kvmclock: Setup kvmclock early iff it's sched_clock
> 
>  arch/x86/coco/sev/core.c           |   9 +-
>  arch/x86/coco/tdx/tdx.c            |  27 ++-
>  arch/x86/include/asm/kvm_para.h    |  10 +-
>  arch/x86/include/asm/paravirt.h    |  16 +-
>  arch/x86/include/asm/tdx.h         |   2 +
>  arch/x86/include/asm/tsc.h         |  20 +++
>  arch/x86/include/asm/x86_init.h    |   2 -
>  arch/x86/kernel/cpu/acrn.c         |   5 +-
>  arch/x86/kernel/cpu/mshyperv.c     |  69 +-------
>  arch/x86/kernel/cpu/vmware.c       |  11 +-
>  arch/x86/kernel/jailhouse.c        |   6 +-
>  arch/x86/kernel/kvm.c              |  39 +++--
>  arch/x86/kernel/kvmclock.c         | 260 +++++++++++++++++++++--------
>  arch/x86/kernel/paravirt.c         |  35 +++-
>  arch/x86/kernel/pvclock.c          |   9 +-
>  arch/x86/kernel/smpboot.c          |   2 +-
>  arch/x86/kernel/tsc.c              | 141 ++++++++++++----
>  arch/x86/kernel/x86_init.c         |   1 -
>  arch/x86/mm/mem_encrypt_amd.c      |   3 -
>  arch/x86/xen/time.c                |  13 +-
>  drivers/clocksource/hyperv_timer.c |  38 +++--
>  include/clocksource/hyperv_timer.h |   2 -
>  kernel/time/timekeeping.c          |   9 +-
>  23 files changed, 487 insertions(+), 242 deletions(-)
> 
> 
> base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3


Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ