[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181004081100.GI19272@hirez.programming.kicks-ass.net>
Date: Thu, 4 Oct 2018 10:11:00 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Vitaly Kuznetsov <vkuznets@...hat.com>
Cc: Marcelo Tosatti <mtosatti@...hat.com>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krcmar <rkrcmar@...hat.com>,
Wanpeng Li <kernellwp@...il.com>,
LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
Matt Rickard <matt@...trans.com.au>,
Stephen Boyd <sboyd@...nel.org>,
John Stultz <john.stultz@...aro.org>,
Florian Weimer <fweimer@...hat.com>,
KY Srinivasan <kys@...rosoft.com>,
devel@...uxdriverproject.org,
Linux Virtualization <virtualization@...ts.linux-foundation.org>,
Arnd Bergmann <arnd@...db.de>, Juergen Gross <jgross@...e.com>
Subject: Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI
support
On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov wrote:
> I was hoping to hear this from you :-) If I am to suggest how we can
> move forward I'd propose:
> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling
> is supported).
> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page
> clocksource is a single page for the whole VM, not a per-cpu thing. Can
> we think that all the buggy hardware is already gone?
No, and it is not the hardware you have to worry about (mostly), it is
the frigging PoS firmware people put on it.
Ever since Nehalem TSC is stable (unless you get to >4 socket systems,
after which it still can be, but bets are off). But even relatively
recent systems fail the TSC sync test because firmware messes it up by
writing to either MSR_TSC or MSR_TSC_ADJUST.
But the thing is, if the TSC is not synced, you cannot use it for
timekeeping, full stop. So having a single page is fine, it either
contains a mult/shift that is valid, or it indicates TSC is messed up
and you fall back to something else.
There is no inbetween there.
For sched_clock we can still use the global page, because the rate will
still be the same for each cpu, it's just offset between CPUs and the
code compensates for that.
Powered by blists - more mailing lists