[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1612161121140.3470@nanos>
Date: Fri, 16 Dec 2016 12:46:12 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
cc: x86@...nel.org, Peter Zijlstra <peterz@...radead.org>,
Borislav Petkov <bp@...en8.de>,
Bruce Schlobohm <bruce.schlobohm@...el.com>,
Roland Scheidegger <rscheidegger_lists@...peed.ch>,
Kevin Stanton <kevin.b.stanton@...el.com>,
Allen Hung <allen_hung@...l.com>, stable@...r.kernel.org
Subject: Re: [patch 2/2] x86/tsc: Force TSC_ADJUST register to value >=
zero
On Tue, 13 Dec 2016, Thomas Gleixner wrote:
> Roland reported that his DELL T5810 sports a value add BIOS which
> completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with
> random negative TSC_ADJUST values, different on all CPUs. That renders the
> TSC useless because the sycnchronization check fails.
While everyone assumed that this is the usual DELL squirmware problem, I
have to say it's not.
Just got my hands on a Skylake based Lenovo S510 box and it shows the same
feature:
TSC ADJUST: CPU0: -10123656703215
CPU1: -10123656796701
CPU2: -10123656797460
CPU3: -10123656798366
Which causes the TSC to be out of sync on a stock upstream kernel and the
TSC deadline timer wreckage is happening on that machine as well.
I'm pretty sure, that this well thought out feature to 'hide power on time'
from TSC has not been independently 'invented' by DELL and Lenovo BIOS
tinkerers.
I rather have the impression that this is an advisory or feature kit from
some other entity. Whoever came up with this misfeature at Intel and/or
Microsoft (sorry, I could not come up with any other suspects) should be
promoted to run the 'Linux on feature-plagued systems' hot line.
As this seems to be more wide spread than we thought initially, we have to
think about a solution for stable kernels, especially 4.9. And distros will
have to think about that as well....
We have two options:
1) Disable TSC deadline timer by default and force users with sane machines
to enable it on the kernel command line.
Upside: Very small patch
Downside: Degrades existing setups on sane machines, keeps TSC unusable
on affected machines. We have no idea what other hidden side
effects the TSC_ADJUST tinkering has. If there are any, they
ain't be nice ones.
2) Push the whole TSC_ADJUST sanitizing machinery into stable
Upside: Does not affect sane machines and gives a benefit to users of
affected machines
Downside: Rather large patch, but not that risky either. Needs a few
eyes and good test coverage though
Thoughts?
tglx
Powered by blists - more mailing lists