[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160811084002.GV30192@twins.programming.kicks-ass.net>
Date: Thu, 11 Aug 2016 10:40:02 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Frederic Weisbecker <fweisbec@...il.com>
Cc: Christoph Lameter <cl@...ux.com>,
Chris Metcalf <cmetcalf@...lanox.com>,
Gilad Ben Yossef <giladb@...lanox.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Andy Lutomirski <luto@...capital.net>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
linux-doc@...r.kernel.org, linux-api@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: clocksource_watchdog causing scheduling of timers every second
(was [v13] support "task_isolation" mode)
On Thu, Aug 11, 2016 at 12:16:58AM +0200, Frederic Weisbecker wrote:
> I had similar issues, this seems to happen when the tsc is considered not reliable
> (which doesn't necessarily mean unstable. I think it has to do with some x86 CPU feature
> flag).
Right, as per the other email, in general we cannot know/assume the TSC
to be working as intended :/
> IIRC, this _has_ to execute on all online CPUs because every TSCs of running CPUs
> are concerned.
With modern Intel we could run it on one CPU per package I think, but at
the same time, too much in NOHZ_FULL assumes the TSC is indeed sane so
it doesn't make sense to me to keep the watchdog running, when it
triggers it would also have to kill all NOHZ_FULL stuff, which would
probably bring the entire machine down..
Arguably we should issue a boot time warning if NOHZ_FULL is configured
and the TSC watchdog is running.
> I personally override that with passing the tsc=reliable kernel
> parameter. Of course use it at your own risk.
Yes, that is (sadly) our only option. Manually assert our hardware is
solid under the intended workload and then manually disabling the
watchdog.
Powered by blists - more mailing lists