[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1429538139.2042.10.camel@stgolabs.net>
Date: Mon, 20 Apr 2015 06:55:39 -0700
From: Davidlohr Bueso <dave@...olabs.net>
To: Ingo Molnar <mingo@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Chris Mason <clm@...com>, Steven Rostedt <rostedt@...dmis.org>,
fredrik.markstrom@...driver.com, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 2/2] futex: lockless wakeups
On Mon, 2015-04-20 at 08:18 +0200, Ingo Molnar wrote:
> Please write a small description we can cite to driver authors once
> the (inevitable) breakages appear, outlining this new behavior and its
> implications, so that we can fix any remaining bugs ASAP.
I wouldn't call this new behavior, simply because changing a critical
region should not be labeled as such imho. Other than asking driver
authors to put their schedule() in a loop to confirm that the expected
condition has in fact occurred, I'm not sure what else we can ask them
to do -- as you know, this is not just about futexes.
> I'll also let this pending a bit longer than other changes, to make
> sure we shake out any bugs/regressions triggered by this change.
>
> Third, it might make sense to add a new 'spurious wakeup injection
> debug mechanism' that, if enabled in the .config, automatically and
> continuously inserts spurious wakeups at a given, slightly randomized
> rate - which would ensure that all kernel facilities can robustly
> handle spurious wakeups.
I have been using this from Peter to test against:
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432..fdf1f68 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -214,9 +214,10 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
#define TASK_WAKEKILL 128
#define TASK_WAKING 256
#define TASK_PARKED 512
-#define TASK_STATE_MAX 1024
+#define TASK_YIELD 1024
+#define TASK_STATE_MAX 2048
-#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP"
+#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPY"
extern char ___assert_task_state[1 - 2*!!(
sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e..2c938ae 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1005,7 +1005,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
* ttwu() will sort out the placement.
*/
WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
- !p->on_rq);
+ !p->on_rq && !(p->state & TASK_YIELD));
#ifdef CONFIG_LOCKDEP
/*
@@ -2743,6 +2743,23 @@ static void __sched __schedule(void)
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
} else {
+
+ /*
+ * Provide an auto-yield feature on schedule().
+ *
+ * The thought is to avoid a sleep+wakeup cycle
+ * if simply yielding the cpu will suffice to
+ * satisfy the required condition.
+ *
+ * Assumes the calling schedule() site can deal
+ * with spurious wakeups.
+ */
+ if (prev->state & TASK_YIELD) {
+ prev->state &= ~TASK_YIELD;
+ if (rq->nr_running > 1)
+ goto no_deactivate;
+ }
+
deactivate_task(rq, prev, DEQUEUE_SLEEP);
prev->on_rq = 0;
@@ -2759,6 +2776,7 @@ static void __sched __schedule(void)
try_to_wake_up_local(to_wakeup);
}
}
+ no_deactivate:
switch_count = &prev->nvcsw;
}
> My guess would be that most remaining fragilities against spurious
> wakeups ought to be in the boot/init phase, so I'd keep an eye out for
> suspend/resume regressions.
Correct, which is why I'm not that concerned anymore about spurious
wakups, in fact that code above now boots and handles correctly on
rather large systems.
>
> > [...] However there is core code that cannot handle them afaict, and
> > furthermore tglx does have the point that other events can already
> > trigger them anyway.
>
> s/there is core code/there is no core code
heh yes.
Thanks,
Davidlohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists