linux-kernel - Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <047afdde655350a6701803aa8ae739a8bd1c1c14.camel@redhat.com>
Date:   Tue, 08 Dec 2020 19:08:00 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Oliver Upton <oupton@...gle.com>
Cc:     kvm list <kvm@...r.kernel.org>, "H. Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Jim Mattson <jmattson@...gle.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        open list <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        Joerg Roedel <joro@...tes.org>, Borislav Petkov <bp@...en8.de>,
        Shuah Khan <shuah@...nel.org>,
        Andrew Jones <drjones@...hat.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE

On Tue, 2020-12-08 at 17:40 +0100, Thomas Gleixner wrote:
> On Tue, Dec 08 2020 at 13:13, Maxim Levitsky wrote:
> > On Mon, 2020-12-07 at 11:29 -0600, Oliver Upton wrote:
> > > How would a VMM maintain the phase relationship between guest TSCs
> > > using these ioctls?
> > 
> > By using the nanosecond timestamp. 
> >  
> > While I did made it optional in the V2 it was done for the sole sake of being 
> > able to set TSC on (re)boot to 0 from qemu, and for cases when qemu migrates 
> > from a VM where the feature is not enabled.
> > In this case the tsc is set to the given value exactly, just like you
> > can do today with KVM_SET_MSRS.
> > In all other cases the nanosecond timestamp will be given.
> >  
> > When the userspace uses the nanosecond timestamp, the phase relationship
> > would not only be maintained but be exact, even if TSC reads were not
> > synchronized and even if their restore on the target wasn't synchronized as well.
> >  
> > Here is an example:
> >  
> > Let's assume that TSC on source/target is synchronized, and that the guest TSC
> > is synchronized as well.
> > 
> > Let's call the guest TSC frequency F (guest TSC increments by F each second)
> >  
> > We do KVM_GET_TSC_STATE on vcpu0 and receive (t0,tsc0).
> > We do KVM_GET_TSC_STATE on vcpu1 after 1 second passed (exaggerated) 
> > and receive (t0 + 1s, tsc0 + F)
> 
> Why?
> 
> You freeeze the VM and store the realtime timestamp of doing that. At
> that point assuming a full sync host system the only interesting thing
> to store is the guest offset which is the same on all vCPUs and it is
> known already.
> 
> So on restore the only thing which needs to be adjusted is the guest
> wide offset.
> 
>      newoffset = oldoffset + (now - tfreeze)
> 
> Then set newoffset for all vCPUs. Anything else is complexity for no
> value and bound to fall apart in hard to debug ways.
> 
> The offset is still the same for all vCPUs whether you can restore them
> in the same nanosecond or whether you need 3 minutes for each one. It
> does not matter because when you restore vCPU1 3 minutes after vCPU0
> then TSC has advanced 3 minutes as well. It's still correct from the
> guest POV.
> 
> Even if you support TSCADJUST and let the guest write to it does not
> change the per guest offset at all. TSCADJUST is per [v]CPU and adds on
> top:
> 
>     tscvcpu = tsc_host + guest_offset + TSC_ADJUST
> 
> Scaling is just orthogonal and does not change any of this.

I agree with this, and I think that this is what we will end up doing.
Paulo, what do you think about this?

Best regards,
	Maxim Levitsky

> 
> Thanks,
> 
>         tglx
>