lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Aug 2014 13:47:42 +0400
From:	Kirill Tkhai <ktkhai@...allels.com>
To:	<linux-kernel@...r.kernel.org>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Paul Turner <pjt@...gle.com>, Oleg Nesterov <oleg@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	"Mike Galbraith" <umgwanakikbuti@...il.com>,
	Kirill Tkhai <tkhai@...dex.ru>,
	"Tim Chen" <tim.c.chen@...ux.intel.com>,
	Ingo Molnar <mingo@...nel.org>,
	"Nicolas Pitre" <nicolas.pitre@...aro.org>
Subject: [PATCH v5 2/5] sched: Teach scheduler to understand
 TASK_ON_RQ_MIGRATING state


This is a new state which will be used to indicate that a task is in a
process of migrating between two RQs. It allows to get rid of
double_rq_lock(), which we used to use to change a rq of a queued task
before.

Let's consider the example. To move a task between src_rq and dst_rq
we will do the following:

	raw_spin_lock(&src_rq->lock);
	/* p is a task which is queued on src_rq */
	p = ...;

	dequeue_task(src_rq, p, 0);
	p->on_rq = TASK_ON_RQ_MIGRATING;
	set_task_cpu(p, dst_cpu);
	raw_spin_unlock(&src_rq->lock);

    	/*
    	 * Both of RQs are unlocked here.
    	 * Task p is dequeued from src_rq
    	 * but its on_rq is not zero.
    	 */

	raw_spin_lock(&dst_rq->lock);
	p->on_rq = TASK_ON_RQ_QUEUED;
	enqueue_task(dst_rq, p, 0);
	raw_spin_unlock(&dst_rq->lock);

While p->on_rq is TASK_ON_RQ_MIGRATING, task is considered as "migrating",
and other parallel scheduler actions with it are not available for
parallel caller. The parallel caller is spining till migration is
completed.

The unavailable actions are changing of cpu affinity, changing of
priority etc, in other words all the functionality which used to
require task_rq(p)->lock before (and related to the task).

To implement TASK_ON_RQ_MIGRATING support we primarily are using the
following fact. Most of scheduler users (from which we are protecting
a migrating task) use task_rq_lock() and __task_rq_lock() to get the
lock of task_rq(p). These primitives know that task's cpu may change,
and they are spining while the lock of the right RQ is not held. We
add one more condition into them, so they will be also spinning until
the migration is finished.

Signed-off-by: Kirill Tkhai <ktkhai@...allels.com>
---
 kernel/sched/core.c  |   12 +++++++++---
 kernel/sched/sched.h |    6 ++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1276ba2..cef1a13 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -317,9 +317,12 @@ static inline struct rq *__task_rq_lock(struct task_struct *p)
 	for (;;) {
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
-		if (likely(rq == task_rq(p)))
+		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p)))
 			return rq;
 		raw_spin_unlock(&rq->lock);
+
+		while (unlikely(task_on_rq_migrating(p)))
+			cpu_relax();
 	}
 }
 
@@ -336,10 +339,13 @@ static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)
 		raw_spin_lock_irqsave(&p->pi_lock, *flags);
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
-		if (likely(rq == task_rq(p)))
+		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p)))
 			return rq;
 		raw_spin_unlock(&rq->lock);
 		raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
+
+		while (unlikely(task_on_rq_migrating(p)))
+			cpu_relax();
 	}
 }
 
@@ -1662,7 +1668,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	success = 1; /* we're going to change ->state */
 	cpu = task_cpu(p);
 
-	if (task_on_rq_queued(p) && ttwu_remote(p, wake_flags))
+	if (p->on_rq && ttwu_remote(p, wake_flags))
 		goto stat;
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 26566d0..aa0f73b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -17,6 +17,7 @@ struct rq;
 
 /* task_struct::on_rq states: */
 #define TASK_ON_RQ_QUEUED	1
+#define TASK_ON_RQ_MIGRATING	2
 
 extern __read_mostly int scheduler_running;
 
@@ -950,6 +951,11 @@ static inline int task_on_rq_queued(struct task_struct *p)
 	return p->on_rq == TASK_ON_RQ_QUEUED;
 }
 
+static inline int task_on_rq_migrating(struct task_struct *p)
+{
+	return p->on_rq == TASK_ON_RQ_MIGRATING;
+}
+
 #ifndef prepare_arch_switch
 # define prepare_arch_switch(next)	do { } while (0)
 #endif



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ