[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1443453446-7827-7-git-send-email-cmetcalf@ezchip.com>
Date: Mon, 28 Sep 2015 11:17:21 -0400
From: Chris Metcalf <cmetcalf@...hip.com>
To: Gilad Ben Yossef <giladb@...hip.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
"Rik van Riel" <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Andy Lutomirski <luto@...capital.net>,
<linux-kernel@...r.kernel.org>
CC: Chris Metcalf <cmetcalf@...hip.com>
Subject: [PATCH v7 06/11] nohz: task_isolation: allow tick to be fully disabled
While the current fallback to 1-second tick is still helpful for
maintaining completely correct kernel semantics, processes using
prctl(PR_SET_TASK_ISOLATION) semantics place a higher priority on
running completely tickless, so don't bound the time_delta for such
processes. In addition, due to the way such processes quiesce by
waiting for the timer tick to stop prior to returning to userspace,
without this commit it won't be possible to use the task_isolation
mode at all.
Removing the 1-second cap was previously discussed (see link
below) and Thomas Gleixner observed that vruntime, load balancing
data, load accounting, and other things might be impacted.
Frederic Weisbecker similarly observed that allowing the tick to
be indefinitely deferred just meant that no one would ever fix the
underlying bugs. However it's at least true that the mode proposed
in this patch can only be enabled on a nohz_full core by a process
requesting task_isolation mode, which may limit how important it is
to maintain scheduler data correctly, for example.
Paul McKenney observed that if provide a mode where the 1Hz fallback
timer is removed, this will provide an environment where new code
that relies on that tick will get punished, and we won't forgive
such assumptions silently, so it may also be worth it from that
perspective.
Finally, it's worth observing that the tile architecture has been
using similar code for its Zero-Overhead Linux for many years
(starting in 2008) and customers are very enthusiastic about the
resulting bare-metal performance on cores that are available to
run full Linux semantics on demand (crash, logging, shutdown, etc).
So this semantics is very useful if we can convince ourselves that
doing this is safe.
Link: https://lkml.kernel.org/r/alpine.DEB.2.11.1410311058500.32582@gentwo.org
Signed-off-by: Chris Metcalf <cmetcalf@...hip.com>
---
kernel/time/tick-sched.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3319e16f31e5..4504c0b95d0d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -24,6 +24,7 @@
#include <linux/posix-timers.h>
#include <linux/perf_event.h>
#include <linux/context_tracking.h>
+#include <linux/isolation.h>
#include <asm/irq_regs.h>
@@ -634,7 +635,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
#ifdef CONFIG_NO_HZ_FULL
/* Limit the tick delta to the maximum scheduler deferment */
- if (!ts->inidle)
+ if (!ts->inidle && !task_isolation_enabled())
delta = min(delta, scheduler_tick_max_deferment());
#endif
--
2.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists