linux-kernel - Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8f3493ca4c0e726d5c3876bb7dd2cfc432d9deaa.camel@infradead.org>
Date:   Wed, 11 Oct 2023 08:18:36 +0100
From:   David Woodhouse <dwmw2@...radead.org>
To:     Sean Christopherson <seanjc@...gle.com>,
        Dongli Zhang <dongli.zhang@...cle.com>
Cc:     Joe Jin <joe.jin@...cle.com>, x86@...nel.org, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, pbonzini@...hat.com,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock
 periodically

On Tue, 2023-10-10 at 17:20 -0700, Sean Christopherson wrote:
> On Wed, Oct 04, 2023, Dongli Zhang wrote:
> > > And because that's not enough, on pCPU migration or if the TSC is unstable,
> > > kvm_arch_vcpu_load() requests KVM_REQ_GLOBAL_CLOCK_UPDATE, which schedules
> > > kvmclock_update_fn() with a delay of 100ms.  The large delay is to play nice with
> > > unstable TSCs.  But if KVM is periodically doing clock updates on all vCPU,
> > > scheduling another update with a *longer* delay is silly.
> > 
> > We may need to add above message to the places, where
> > KVM_REQ_GLOBAL_CLOCK_UPDATE is replaced with KVM_REQ_CLOCK_UPDATE in the patch?
> 
> Yeah, comments are most definitely needed, this was just intended to be a quick
> sketch to get the ball rolling.
> 
> > > -static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> > > -{
> > > -       struct kvm *kvm = v->kvm;
> > > -
> > > -       kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> > > -       schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> > > -                                       KVMCLOCK_UPDATE_DELAY);
> > > -}
> > > -
> > >  #define KVMCLOCK_SYNC_PERIOD (300 * HZ)
> > 
> > While David mentioned "maximum delta", how about to turn above into a module
> > param with the default 300HZ.
> > 
> > BTW, 300HZ should be enough for vCPU hotplug case, unless people prefer 1-hour
> > or 1-day.
> 
> Hmm, I think I agree with David that it would be better if KVM can take care of
> the gory details and promise a certain level of accuracy.  I'm usually a fan of
> punting complexity to userspace, but requiring every userspace to figure out the
> ideal sync frequency on every platform is more than a bit unfriendly.  And it
> might not even be realistic unless userspace makes assumptions about how the kernel
> computes CLOCK_MONOTONIC_RAW from TSC cycles.
> 

I think perhaps I would rather save up my persuasiveness on the topic
of "let's not make things too awful for userspace to cope with" for the
live update/migration mess. I think I need to dust off that attempt at
fixing our 'how to migrate with clocks intact' documentation from
https://lore.kernel.org/kvm/13f256ad95de186e3b6bcfcc1f88da5d0ad0cb71.camel@infradead.org/
The changes we're discussing here obviously have an effect on migration
too.

Where the host TSC is actually reliable, I would really prefer for the
kvmclock to just be a fixed function of the guest TSC and *not* to be
arbitrarily yanked back[1] to the host's CLOCK_MONOTONIC periodically.

That makes the process of migrating to another machine (potentially
with a different host TSC frequency) a whole lot simpler. Userspace
just needs to know two things from the kernel:

 • the relationship between the guest's TSC and its kvmclock (which we
   helpfully advertise to the *guest* in a pvclock structure, and can
   give that to userspace too). 

 • The value of *either* the guest's TSC or its kvmclock, at a given
   value of CLOCK_TAI (not CLOCK_WALLTIME). Theoretically, this can be
   either TSC or kvmclock. But I think for best precision it would need
   to be TSC?

I am aware that I glibly said "guest TSC" as if there's only one, or
they're in sync. I think we can substitute "the TSC of guest vCPU0" in
the case of migration. We don't want a guest to be able to change its
kvmclock by writing to vCPU0's TSC though, so we may need to keep (and
migrate) an additional offset value for tracking that?

[1] Yes, I believe "back" does happen. I have test failures in my queue
to look at, where guests see the "Xen" clock going backwards.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)