linux-kernel - Re: [PATCH] KVM: x86/xen: improve accuracy of Xen timers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6c9671b4-d997-42ac-9821-06accb97357f@xen.org>
Date:   Tue, 31 Oct 2023 12:06:17 +0000
From:   Paul Durrant <xadimgnik@...il.com>
To:     David Woodhouse <dwmw2@...radead.org>,
        Paul Durrant <xadimgnik@...il.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] KVM: x86/xen: improve accuracy of Xen timers

On 31/10/2023 11:42, David Woodhouse wrote:
> 
> 
> On 31 October 2023 10:42:42 GMT, Paul Durrant <xadimgnik@...il.com> wrote:
>> There is no documented ordering requirement on setting KVM_XEN_VCPU_ATTR_TYPE_TIMER versus KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or KVM_XEN_ATTR_TYPE_SHARED_INFO but kvm_xen_start_timer() now needs the vCPU's pvclock to be valid. Should actually starting the timer not be deferred until then? (Or simply add a check here and have the attribute setting fail if the pvclock is not valid).
> 
> 
> There are no such dependencies and I don't want there to be. That would be the *epitome* of what my "if it needs documenting, fix it first" mantra is intended to correct.
> 
> The fact that this broke on migration because the hv_clock isn't set up yet, as we saw in our overnight testing, is just a bug. In my tree I've fixed it thus:
> 
> index 63531173dad1..e3d2d63eef34 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -182,7 +182,7 @@ static void kvm_xen_start_timer(st
> ruct kvm_vcpu *vcpu, u64 guest_abs,
>           * the absolute CLOCK_MONOTONIC time at which
> the timer should
>           * fire.
>           */
> -       if (vcpu->kvm->arch.use_master_clock &&
> +       if (vcpu->arch.hv_clock.version && vcpu->kvm->
> arch.use_master_clock &&
>              static_cpu_has(X86_FEATURE_CONSTANT_TSC))
> {
>                  uint64_t host_tsc, guest_tsc;
> 
> @@ -206,9 +206,23 @@ static void kvm_xen_start_timer(s
> truct kvm_vcpu *vcpu, u64 guest_abs,
> 
>                  /* Calculate the guest kvmclock as the
>   guest would do it. */
>                  guest_tsc = kvm_read_l1_tsc(vcpu, host
> _tsc);
> -               guest_now = __pvclock_read_cycles(&vcp
> u->arch.hv_clock, guest_tsc);
> +               guest_now = __pvclock_read_cycles(&vcp
> u->arch.hv_clock,
> +                                                 gues
> t_tsc);
>          } else {
> -               /* Without CONSTANT_TSC, get_kvmclock_
> ns() is the only option */
> +               /*
> +                * Without CONSTANT_TSC, get_kvmclock_
> ns() is the only option.
> +                *
> +                * Also if the guest PV clock hasn't b
> een set up yet, as is
> +                * likely to be the case during migrat
> ion when the vCPU has
> +                * not been run yet. It would be possi
> ble to calculate the
> +                * scaling factors properly in that ca
> se but there's not much
> +                * point in doing so. The get_kvmclock
> _ns() drift accumulates
> +                * over time, so it's OK to use it at
> startup. Besides, on
> +                * migration there's going to be a lit
> tle bit of skew in the
> +                * precise moment at which timers fire
>   anyway. Often they'll
> +                * be in the "past" by the time the VM
>   is running again after
> +                * migration.
> +                */
>                  guest_now = get_kvmclock_ns(vcpu->kvm)
> ;
>                  kernel_now = ktime_get();
>          }
> --
> 2.41.0
> 
> We *could* reset the timer when the vCPU starts to run and handles the KVM_REQ_CLOCK_UPDATE event, but I don't want to for two reasons.
> 
> Firstly, we just don't need that complexity. This approach is OK, as the newly-added comment says. And we do need to fix get_kvmclock_ns() anyway, so it should work fine. Most of this patch will still be useful as it uses a single TSC read and we *do* need to do that part even after all the kvmclock brokenness is fixed. But the complexity on KVM_REQ_CLOCK_UPDATE isn't needed in the long term.
> 
> Secondly, it's also wrong thing to do in the general case. Let's say KVM does its thing and snaps the kvmclock backwards in time on a KVM_REQ_CLOCK_UPDATE... do we really want to reinterpret existing timers against the new kvmclock? They were best left alone, I think.

Do we not want to do exactly that? If the master clock is changed, why 
would we not want to re-interpret the guest's idea of time? That update 
will be visible to the guest when it re-reads the PV clock source.

   Paul