lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZMf9ovBFpGNEOG3c@linux.dev>
Date:   Mon, 31 Jul 2023 18:29:54 +0000
From:   Oliver Upton <oliver.upton@...ux.dev>
To:     Like Xu <like.xu.linux@...il.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] KVM: x86/tsc: Don't sync user changes to TSC with
 KVM-initiated change

On Mon, Jul 31, 2023 at 04:07:58PM +0800, Like Xu wrote:
> From: Like Xu <likexu@...cent.com>
> 
> Add kvm->arch.user_changed_tsc to avoid synchronizing user changes to
> the TSC with the KVM-initiated change in kvm_arch_vcpu_postcreate() by
> conditioning this mess on userspace having written the TSC at least
> once already.
> 
> Here lies UAPI baggage: user-initiated TSC write with a small delta
> (1 second) of virtual cycle time against real time is interpreted as an
> attempt to synchronize the CPU. In such a scenario, the vcpu's tsc_offset
> is not configured as expected, resulting in significant guest service
> response latency, which is observed in our production environment.

The changelog reads really weird, because it is taken out of context
when it isn't a comment over the affected code. Furthermore, 'our
production environment' is a complete black box to the rest of the
community, it would be helpful spelling out exactly what the use case
is.

Suggested changelog:

  KVM interprets writes to the TSC with values within 1 second of each
  other as an attempt to synchronize the TSC for all vCPUs in the VM,
  and uses a common offset for all vCPUs in a VM. For brevity's sake
  let's just ignore what happens on systems with an unstable TSC.

  While this may seem odd, it is imperative for VM save/restore, as VMMs
  such as QEMU have long resorted to saving the TSCs (by value) from all
  vCPUs in the VM at approximately the same time. Of course, it is
  impossible to synchronize all the vCPU ioctls to capture the exact
  instant in time, hence KVM fudges it a bit on the restore side.

  This has been useful for the 'typical' VM lifecycle, where in all
  likelihood the VM goes through save/restore a considerable amount of
  time after VM creation. Nonetheless, there are some use cases that
  need to restore a VM snapshot that was created very shortly after boot
  (<1 second). Unfortunately the TSC sync code makes no distinction
  between kernel and user-initiated writes, which leads to the target VM
  synchronizing on the TSC offset from creation instead of the
  user-intended value.

  Avoid synchronizing user-initiated changes to the guest TSC with the
  KVM initiated change in kvm_arch_vcpu_postcreate() by conditioning the
  logic on userspace having written the TSC at least once.

I'll also note that the whole value-based TSC sync scheme is in
desperate need of testing.

-- 
Thanks,
Oliver

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ