lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z6QQjmJE4CiWtUpI@google.com>
Date: Wed, 5 Feb 2025 17:29:50 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Suleiman Souhlal <suleiman@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, Chao Gao <chao.gao@...el.com>, 
	David Woodhouse <dwmw2@...radead.org>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	ssouhlal@...ebsd.org
Subject: Re: [PATCH v3 1/3] kvm: Introduce kvm_total_suspend_ns().

On Wed, Feb 05, 2025, Suleiman Souhlal wrote:
> On Tue, Feb 4, 2025 at 4:58 PM Suleiman Souhlal <suleiman@...gle.com> wrote:
> > > Given that s2idle and standby don't reset host TSC, I think the right way to
> > > handle this conundrum is to address the flaw that's noted in the "backwards TSC"
> > > logic, and adjust guest TSC to be fully up-to-date in the S3 (or lower) case.
> > >
> > >          * ......................................  Unfortunately, we can't
> > >          * bring the TSCs fully up to date with real time, as we aren't yet far
> > >          * enough into CPU bringup that we know how much real time has actually
> > >          * elapsed; our helper function, ktime_get_boottime_ns() will be using boot
> > >          * variables that haven't been updated yet.
> > >
> > > I have no idea why commit 0dd6a6edb012 ("KVM: Dont mark TSC unstable due to S4
> > > suspend") hooked kvm_arch_enable_virtualization_cpu() instead of implementing a
> > > PM notifier, but I don't see anything that suggests it was deliberate, i.e. that
> > > KVm *needs* to effectively snapshot guest TSC when onlining CPUs.
> > >
> > > If that wart is fixed, then both kvmclock and TSC will account host suspend time,
> > > and KVM can safely account the suspend time into steal time regardless of which
> > > clock(s) the guest is using.
> >
> > I tried your suggestion of moving this to a PM notifier and I found
> > that it's possible for VCPUs to run after resume but before the PM
> > notifier has been called, because the resume notifiers get called
> > after tasks are unfrozen. Unfortunately that means that if we were to
> > do that, guest TSCs could go backwards.

Ugh.  That explains why KVM hooks the CPU online path.

> > However, I think it should be possible to keep the existing backwards
> > guest TSC prevention code but also use a notifier that further adjusts
> > the guest TSCs to advance time on suspends where the TSC did go
> > backwards. This would make both s2idle and deep suspends behave the
> > same way.
> 
> An alternative might be to block VCPUs from newly entering the guest
> between the pre and post suspend notifiers.
> Otherwise, some of the steal time accounting would have to be done in
> kvm_arch_enable_virtualization_cpu(), to make sure it gets applied on
> the first VCPU run, in case that happens before the resume notifier
> would have fired. But the comment there says we can't call
> ktime_get_boottime_ns() there, so maybe that's not possible.

I don't think the PM notifier approach is viable.  It's simply too late.  Without
a hook in CPU online, KVM can't even tell which VMs/vCPUs were running before the
suspend, i.e. which VMs need to be updated.

One idea would be to simply fast forward guest TSC to current time when the vCPU
is loaded after suspend+resume.  E.g. this hack appears to work.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dcd0c12c308e..27b25f8ca413 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4824,7 +4824,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
        /* Apply any externally detected TSC adjustments (due to suspend) */
        if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
-               adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
+               u64 kernel_ns, tsc_now;
+
+               if (kvm_get_time_and_clockread(&kernel_ns, &tsc_now)) {
+                       u64 l1_tsc = nsec_to_cycles(vcpu, vcpu->kvm->arch.kvmclock_offset + kernel_ns);
+                       u64 offset = kvm_compute_l1_tsc_offset(vcpu, l1_tsc);
+
+                       kvm_vcpu_write_tsc_offset(vcpu, offset);
+               } else {
+                       adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
+               }
                vcpu->arch.tsc_offset_adjustment = 0;
                kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
        }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ