[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sgq5a2hq.fsf@vitty.brq.redhat.com>
Date: Tue, 13 Aug 2019 10:33:37 +0200
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Michael Kelley <mikelley@...rosoft.com>,
Tianyu Lan <lantianyu1986@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Tianyu Lan <Tianyu.Lan@...rosoft.com>,
"linux-arch\@vger.kernel.org" <linux-arch@...r.kernel.org>,
"linux-hyperv\@vger.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel\@vger kernel org" <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
the arch/x86 maintainers <x86@...nel.org>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Sasha Levin <sashal@...nel.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Arnd Bergmann <arnd@...db.de>,
"ashal\@kernel.org" <ashal@...nel.org>
Subject: RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Michael Kelley <mikelley@...rosoft.com> writes:
> From: Tianyu Lan <lantianyu1986@...il.com> Sent: Tuesday, July 30, 2019 6:41 AM
>>
>> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
>> >
>> > Peter Zijlstra <peterz@...radead.org> writes:
>> >
>> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> > >> lantianyu1986@...il.com writes:
>> > >>
>> > >> > From: Tianyu Lan <Tianyu.Lan@...rosoft.com>
>> > >> >
>> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, which
>> > >> > can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
>> > >> > to set the sched clock function appropriately. On x86, this sets
>> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > >> > scaled and adjusted to be continuous.
>> > >>
>> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
>> > >> I'm afraid.
>> > >>
>> > >> On the other hand, what we have now is probably worse: TSC can,
>> > >> actually, jump backwards (e.g. on migration) and we're breaking the
>> > >> requirements for sched_clock().
>> > >
>> > > That (obviously) also breaks the requirements for using TSC as
>> > > clocksource.
>> > >
>> > > IOW, it breaks the entire purpose of having TSC in the first place.
>> >
>> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
>> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
>> > instead. The problem is that 'TSC page' can be disabled by the
>> > hypervisor and in that case the only remaining clocksource is MSR-based
>> > (slow).
>> >
>>
>> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
>> kernel uses MSR based
>> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
>> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
>> take this into
>> account and determine which clocksource should be exposed or not.
>>
>
> We've confirmed with the Hyper-V team that the TSC page is always available
> on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
> hardware presents an InvariantTSC.
Currently we check that TSC page is valid on every read and it seems
this is redundant, right? It is either available on boot or not. I can
only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
likely disable the page (and we can get reenlightenment notification
then).
> But the Linux Kconfig's are set up so
> the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
> reads. For 32-bit, this set of changes will add more overhead because the
> sched clock reads will now be MSR reads.
>
> I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
> I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
> supported in Azure so usage is pretty small. The alternative would be to continue
> to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
> live migration or similar scenarios.
The issue needs fixing, I agree, however using MSR based clocksource as
sched clock may give us too big of a performance hit (not sure who cares
about 32 bit guest performance nowadays but still). What stops us from
enabling TSC page for 32 bit guests if it is available?
--
Vitaly
Powered by blists - more mailing lists