lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM5PR21MB0137E03AAD8C2EA61EC81ED7D7D30@DM5PR21MB0137.namprd21.prod.outlook.com>
Date:   Mon, 12 Aug 2019 19:22:25 +0000
From:   Michael Kelley <mikelley@...rosoft.com>
To:     Tianyu Lan <lantianyu1986@...il.com>,
        vkuznets <vkuznets@...hat.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r kernel org" <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Sasha Levin <sashal@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Arnd Bergmann <arnd@...db.de>,
        "ashal@...nel.org" <ashal@...nel.org>
Subject: RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock
 function

From: Tianyu Lan <lantianyu1986@...il.com> Sent: Tuesday, July 30, 2019 6:41 AM
> 
> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
> >
> > Peter Zijlstra <peterz@...radead.org> writes:
> >
> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> > >> lantianyu1986@...il.com writes:
> > >>
> > >> > From: Tianyu Lan <Tianyu.Lan@...rosoft.com>
> > >> >
> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> > >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> > >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> > >> > to set the sched clock function appropriately.  On x86, this sets
> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> > >> > scaled and adjusted to be continuous.
> > >>
> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> > >> MSR-based clocksource but using it as sched_clock() can be very slow,
> > >> I'm afraid.
> > >>
> > >> On the other hand, what we have now is probably worse: TSC can,
> > >> actually, jump backwards (e.g. on migration) and we're breaking the
> > >> requirements for sched_clock().
> > >
> > > That (obviously) also breaks the requirements for using TSC as
> > > clocksource.
> > >
> > > IOW, it breaks the entire purpose of having TSC in the first place.
> >
> > Currently, we mark raw TSC as unstable when running on Hyper-V (see
> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> > instead. The problem is that 'TSC page' can be disabled by the
> > hypervisor and in that case the only remaining clocksource is MSR-based
> > (slow).
> >
> 
> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
> kernel uses MSR based
> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
> take this into
> account and determine which clocksource should be exposed or not.
> 

We've confirmed with the Hyper-V team that the TSC page is always available
on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
hardware presents an InvariantTSC.  But the Linux Kconfig's are set up so
the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
reads.  For 32-bit, this set of changes will add more overhead because the
sched clock reads will now be MSR reads.

I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
supported in Azure so usage is pretty small.  The alternative would be to continue
to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
live migration or similar scenarios.

Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ