[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2eaf612b-1ce3-0dfe-5d2e-2cf29bba7641@gmail.com>
Date: Wed, 13 Sep 2023 18:59:30 +0800
From: Like Xu <like.xu.linux@...il.com>
To: David Woodhouse <dwmw2@...radead.org>,
Sean Christopherson <seanjc@...gle.com>
Cc: Oliver Upton <oliver.upton@...ux.dev>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH v6] KVM: x86/tsc: Don't sync user-written TSC against
startup values
On 13/9/2023 6:51 pm, David Woodhouse wrote:
> On Wed, 2023-09-13 at 18:37 +0800, Like Xu wrote:
>> From: Like Xu <likexu@...cent.com>
>>
>> The legacy API for setting the TSC is fundamentally broken, and only
>> allows userspace to set a TSC "now", without any way to account for
>> time lost to preemption between the calculation of the value, and the
>> kernel eventually handling the ioctl.
>>
>> To work around this we have had a hack which, if a TSC is set with a
>> value which is within a second's worth of a previous vCPU, assumes that
>> userspace actually intended them to be in sync and adjusts the newly-
>> written TSC value accordingly.
>>
>> Thus, when a VMM restores a guest after suspend or migration using the
>> legacy API, the TSCs aren't necessarily *right*, but at least they're
>> in sync.
>>
>> This trick falls down when restoring a guest which genuinely has been
>> running for less time than the 1 second of imprecision which we allow
>> for in the legacy API. On *creation* the first vCPU starts its TSC
>> counting from zero, and the subsequent vCPUs synchronize to that. But
>> then when the VMM tries to set the intended TSC value, because that's
>> within a second of what the last TSC synced to, it just adjusts it to
>> match that.
>>
> Proofreading my own words here... "it just adjusts it to match" is
> using the same pronoun for different things and is probably hard to
> follow. Perhaps "KVM just adjusts it to match" is nicer.
>
>> The correct answer is for the VMM not to use the legacy API of course.
>>
>> But we can pile further hacks onto our existing hackish ABI, and
>> declare that the *first* value written by userspace (on any vCPU)
>> should not be subject to this 'correction' to make it sync up with
>> values that only from the kernel's default vCPU creation.
>
> ^^
> ... that only *come* from the kernel's...
>
>
>>
>> To that end: Add a flag in kvm->arch.user_set_tsc, protected by
>> kvm->arch.tsc_write_lock, to record that a TSC for at least one vCPU in
>> this KVM *has* been set by userspace. Make the 1-second slop hack only
>> trigger if that flag is already set.
>>
>> Reported-by: Yong He <alexyonghe@...cent.com>
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217423
>> Suggested-by: Oliver Upton <oliver.upton@...ux.dev>
>> Original-by: Oliver Upton <oliver.upton@...ux.dev>
>> Original-by: Sean Christopherson <seanjc@...gle.com>
>> Co-developed-by: David Woodhouse <dwmw2@...radead.org>
>> Signed-off-by: David Woodhouse <dwmw2@...radead.org>
>> Signed-off-by: Like Xu <likexu@...cent.com>
>> Tested-by: Yong He <alexyonghe@...cent.com>
>
> Reviewed-by: David Woodhouse <dwmw@...zon.co.uk>
>
> Please remove the 'Signed-off-by' from me. You must never ever *type* a
> signed-off-by line for anyone else. You only ever cut and paste those
> intact when they have provided them for *themselves*.
Nice rule, sorry and thanks for the guidance.
>
> It's OK to remove the Co-developed-by: too. You did the actual typing
> of the code here; I just heckled :)
Thank you for reviewing it.
I'll wait for a cooling off period to see if the maintainers need me to post v7.
Powered by blists - more mailing lists