[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXB8CufQujLAg6bbq=DGAMUE293CF7L4Kp+mCSoNWyuBg@mail.gmail.com>
Date: Fri, 17 Feb 2017 09:02:16 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Vitaly Kuznetsov <vkuznets@...hat.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
"K. Y. Srinivasan" <kys@...rosoft.com>, X86 ML <x86@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Dexuan Cui <decui@...rosoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
devel@...uxdriverproject.org,
Linux Virtualization <virtualization@...ts.linux-foundation.org>
Subject: Re: [PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
On Fri, Feb 17, 2017 at 2:14 AM, Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
> Thomas Gleixner <tglx@...utronix.de> writes:
>
>> On Wed, 15 Feb 2017, Vitaly Kuznetsov wrote:
>>> Actually, we already have an implementation of TSC page update in KVM
>>> (see arch/x86/kvm/hyperv.c, kvm_hv_setup_tsc_page()) and the update does
>>> the following:
>>>
>>> 0) stash seq into seq_prev
>>> 1) seq = 0 making all reads from the page invalid
>>> 2) smp_wmb()
>>> 3) update tsc_scale, tsc_offset
>>> 4) smp_wmb()
>>> 5) set seq = seq_prev + 1
>>
>> I hope they handle the case where seq_prev overflows and becomes 0 :)
>>
>>> As far as I understand this helps with situations you described above as
>>> guest will notice either invalid value of 0 or seq change. In case the
>>> implementation in real Hyper-V is the same we're safe with compile
>>> barriers only.
>>
>> On x86 that's correct. smp_rmb() resolves to barrier(), but you certainly
>> need the smp_wmb() on the writer side.
>>
>> Now looking at the above your reader side code is bogus:
>>
>> + while (1) {
>> + sequence = tsc_pg->tsc_sequence;
>> + if (!sequence)
>> + break;
>>
>> Why would you break out of the loop when seq is 0? The 0 is just telling
>> you that there is an update in progress.
>
> Not only. As far as I understand (and I *think* K. Y. pointed this out)
> when VM is migrating to another host TSC page clocksource is disabled for
> extended period of time so we're better off reading from MSR than
> looping here. With regards to VDSO this means reverting to doing normal
> syscall.
That's a crappy design. The guest really ought to be able to
distinguish "busy, try again" from "bail and use MSR".
Thomas, I can see valid reasons why the hypervisor might want to
temporarily disable shared page-based timing, but I think it's silly
that it's conflated with indicating "retry".
But if this is indeed the ABI, we're stuck with it.
Powered by blists - more mailing lists