[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161024015726.GA26130@linux-80c1.suse>
Date: Sun, 23 Oct 2016 18:57:26 -0700
From: Davidlohr Bueso <dave@...olabs.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Will Deacon <will.deacon@....com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Waiman Long <waiman.long@....com>,
Jason Low <jason.low2@....com>,
Ding Tianhong <dingtianhong@...wei.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Imre Deak <imre.deak@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Terry Rudd <terry.rudd@....com>,
"Paul E. McKenney" <paulmck@...ibm.com>,
Jason Low <jason.low2@...com>,
Chris Wilson <chris@...is-wilson.co.uk>,
Daniel Vetter <daniel.vetter@...ll.ch>,
kent.overstreet@...il.com, axboe@...com,
linux-bcache@...r.kernel.org
Subject: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex:
Restructure wait loop)
On Wed, 19 Oct 2016, Peter Zijlstra wrote:
>Subject: sched: Better explain sleep/wakeup
>From: Peter Zijlstra <peterz@...radead.org>
>Date: Wed Oct 19 15:45:27 CEST 2016
>
>There were a few questions wrt how sleep-wakeup works. Try and explain
>it more.
>
>Requested-by: Will Deacon <will.deacon@....com>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>---
> include/linux/sched.h | 52 ++++++++++++++++++++++++++++++++++----------------
> kernel/sched/core.c | 15 +++++++-------
> 2 files changed, 44 insertions(+), 23 deletions(-)
>
>--- a/include/linux/sched.h
>+++ b/include/linux/sched.h
>@@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> #define set_task_state(tsk, state_value) \
> do { \
> (tsk)->task_state_change = _THIS_IP_; \
>- smp_store_mb((tsk)->state, (state_value)); \
>+ smp_store_mb((tsk)->state, (state_value)); \
> } while (0)
>
>-/*
>- * set_current_state() includes a barrier so that the write of current->state
>- * is correctly serialised wrt the caller's subsequent test of whether to
>- * actually sleep:
>- *
>- * set_current_state(TASK_UNINTERRUPTIBLE);
>- * if (do_i_need_to_sleep())
>- * schedule();
>- *
>- * If the caller does not need such serialisation then use __set_current_state()
>- */
> #define __set_current_state(state_value) \
> do { \
> current->task_state_change = _THIS_IP_; \
>@@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> #define set_current_state(state_value) \
> do { \
> current->task_state_change = _THIS_IP_; \
>- smp_store_mb(current->state, (state_value)); \
>+ smp_store_mb(current->state, (state_value)); \
> } while (0)
>
> #else
>
>+/*
>+ * @tsk had better be current, or you get to keep the pieces.
That reminds me we were getting rid of the set_task_state() calls. Bcache was
pending, being only user in the kernel that doesn't actually use current; but
instead breaks newly (yet blocked/uninterruptible) created garbage collection
kthread. I cannot figure out why this is done (ie purposely accounting the
load avg. Furthermore gc kicks in in very specific scenarios obviously, such
as as by the allocator task, so I don't see why bcache gc should want to be
interruptible.
Kent, Jens, can we get rid of this?
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 76f7534d1dd1..6e3c358b5759 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
if (IS_ERR(c->gc_thread))
return PTR_ERR(c->gc_thread);
- set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
return 0;
}
Thanks,
Davidlohr
>+ *
>+ * The only reason is that computing current can be more expensive than
>+ * using a pointer that's already available.
>+ *
>+ * Therefore, see set_current_state().
>+ */
> #define __set_task_state(tsk, state_value) \
> do { (tsk)->state = (state_value); } while (0)
> #define set_task_state(tsk, state_value) \
>@@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!
> * is correctly serialised wrt the caller's subsequent test of whether to
> * actually sleep:
> *
>+ * for (;;) {
> * set_current_state(TASK_UNINTERRUPTIBLE);
>- * if (do_i_need_to_sleep())
>- * schedule();
>+ * if (!need_sleep)
>+ * break;
>+ *
>+ * schedule();
>+ * }
>+ * __set_current_state(TASK_RUNNING);
>+ *
>+ * If the caller does not need such serialisation (because, for instance, the
>+ * condition test and condition change and wakeup are under the same lock) then
>+ * use __set_current_state().
>+ *
>+ * The above is typically ordered against the wakeup, which does:
>+ *
>+ * need_sleep = false;
>+ * wake_up_state(p, TASK_UNINTERRUPTIBLE);
>+ *
>+ * Where wake_up_state() (and all other wakeup primitives) imply enough
>+ * barriers to order the store of the variable against wakeup.
>+ *
>+ * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
>+ * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
>+ * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
>+ *
>+ * This is obviously fine, since they both store the exact same value.
> *
>- * If the caller does not need such serialisation then use __set_current_state()
>+ * Also see the comments of try_to_wake_up().
> */
> #define __set_current_state(state_value) \
> do { current->state = (state_value); } while (0)
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -2000,14 +2000,15 @@ static void ttwu_queue(struct task_struc
> * @state: the mask of task states that can be woken
> * @wake_flags: wake modifier flags (WF_*)
> *
>- * Put it on the run-queue if it's not already there. The "current"
>- * thread is always on the run-queue (except when the actual
>- * re-schedule is in progress), and as such you're allowed to do
>- * the simpler "current->state = TASK_RUNNING" to mark yourself
>- * runnable without the overhead of this.
>+ * If (@state & @p->state) @p->state = TASK_RUNNING.
> *
>- * Return: %true if @p was woken up, %false if it was already running.
>- * or @state didn't match @p's state.
>+ * If the task was not queued/runnable, also place it back on a runqueue.
>+ *
>+ * Atomic against schedule() which would dequeue a task, also see
>+ * set_current_state().
>+ *
>+ * Return: %true if @p->state changes (an actual wakeup was done),
>+ * %false otherwise.
> */
> static int
> try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
Powered by blists - more mailing lists