lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2a542f07-2158-16aa-e3cb-5431081ee1f6@gmail.com>
Date:   Tue, 1 Aug 2023 10:26:22 +0800
From:   Like Xu <like.xu.linux@...il.com>
To:     Oliver Upton <oliver.upton@...ux.dev>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] KVM: x86/tsc: Don't sync user changes to TSC with
 KVM-initiated change

On 1/8/2023 2:29 am, Oliver Upton wrote:
> On Mon, Jul 31, 2023 at 04:07:58PM +0800, Like Xu wrote:
>> From: Like Xu <likexu@...cent.com>
>>
>> Add kvm->arch.user_changed_tsc to avoid synchronizing user changes to
>> the TSC with the KVM-initiated change in kvm_arch_vcpu_postcreate() by
>> conditioning this mess on userspace having written the TSC at least
>> once already.
>>
>> Here lies UAPI baggage: user-initiated TSC write with a small delta
>> (1 second) of virtual cycle time against real time is interpreted as an
>> attempt to synchronize the CPU. In such a scenario, the vcpu's tsc_offset
>> is not configured as expected, resulting in significant guest service
>> response latency, which is observed in our production environment.
> 
> The changelog reads really weird, because it is taken out of context
> when it isn't a comment over the affected code. Furthermore, 'our
> production environment' is a complete black box to the rest of the
> community, it would be helpful spelling out exactly what the use case
> is.
> 
> Suggested changelog:
> 
>    KVM interprets writes to the TSC with values within 1 second of each
>    other as an attempt to synchronize the TSC for all vCPUs in the VM,
>    and uses a common offset for all vCPUs in a VM. For brevity's sake
>    let's just ignore what happens on systems with an unstable TSC.
> 
>    While this may seem odd, it is imperative for VM save/restore, as VMMs
>    such as QEMU have long resorted to saving the TSCs (by value) from all
>    vCPUs in the VM at approximately the same time. Of course, it is
>    impossible to synchronize all the vCPU ioctls to capture the exact
>    instant in time, hence KVM fudges it a bit on the restore side.
> 
>    This has been useful for the 'typical' VM lifecycle, where in all
>    likelihood the VM goes through save/restore a considerable amount of
>    time after VM creation. Nonetheless, there are some use cases that
>    need to restore a VM snapshot that was created very shortly after boot
>    (<1 second). Unfortunately the TSC sync code makes no distinction
>    between kernel and user-initiated writes, which leads to the target VM
>    synchronizing on the TSC offset from creation instead of the
>    user-intended value.

Great clarification. Thanks, we're on the same page.

> 
>    Avoid synchronizing user-initiated changes to the guest TSC with the
>    KVM initiated change in kvm_arch_vcpu_postcreate() by conditioning the
>    logic on userspace having written the TSC at least once.
> 
> I'll also note that the whole value-based TSC sync scheme is in
> desperate need of testing.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ