linux-kernel - Re: [BUG] How to get real-time priority using idle priority

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 15 Jan 2009 13:54:51 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Mike Galbraith <efault@....de>, Brian Rogers <brian@...w.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [BUG] How to get real-time priority using idle priority


* Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:

> > > Aha.  Yeah, I'll re-test with that instead.
> > 
> > Works a treat.
> 
> *cheer* lets get this merged asap, and CC -stable as well.

I've created the delta patch below for sched/urgent - i've been testing 
the previous version already. Mike, do you agree with this splitup?

	Ingo

--------------->
>From 7bad8c0618dc32ecf7632b7d5abfb21645ee6b60 Mon Sep 17 00:00:00 2001
From: Mike Galbraith <efault@....de>
Date: Thu, 15 Jan 2009 10:28:43 +0100
Subject: [PATCH] sched: fix SCHED_IDLE latency/starvation, v2

The below seems to cure all of the problems I've encountered.

The rate of advance thing in set_task_cpu() seems to have been a case of
fixing the symptom instead of the problem.  Perhaps this needs more
thought, but my box says "Red-Herring" ;-)

The real problem (excluding the SCHED_IDLE specific problems), is that
update_min_vruntime() doesn't work quite as intended, and will slam
min_vruntime far right if load balancing etc places a task which is far
right of the currently running task on the runqueue.  If the currently
running, and up to this point the min_vruntime pace setter, is a hog,
any task waking to this runqueue after min_vruntime leaps forward has to
wait for the hog to consume the gap.  In the case of SCHED_IDLE tasks,
that gap can be huge, but even with a nice 19 tasks it can be quite
large and painful.

Removing the if (vruntime == cfs_rq->min_vruntime) test, which will be
true if the currently running task is the pace setter, cured it for me.

Signed-off-by: Mike Galbraith <efault@....de>
Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: <stable@...nel.org>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 kernel/sched.c      |   23 ++++-------------------
 kernel/sched_fair.c |   13 +++++++------
 2 files changed, 11 insertions(+), 25 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e551bb6..b087c7f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1323,8 +1323,8 @@ static inline void update_load_sub(struct load_weight *lw, unsigned long dec)
  * slice expiry etc.
  */
 
-#define WEIGHT_IDLEPRIO		2
-#define WMULT_IDLEPRIO		(1 << 31)
+#define WEIGHT_IDLEPRIO		3
+#define WMULT_IDLEPRIO		1431655765
 
 /*
  * Nice levels are multiplicative, with a gentle 10% change for every
@@ -1891,23 +1891,8 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 			schedstat_inc(p, se.nr_forced2_migrations);
 	}
 #endif
-	if (old_cpu != new_cpu) {
-		s64 delta = p->se.vruntime - old_cfsrq->min_vruntime;
-
-		/*
-		 * min_vruntimes may be advancing at wildly different
-		 * rates, so we must scale the delta accordingly.
-		 */
-		if (new_cfsrq->load.weight != old_cfsrq->load.weight) {
-			int negative = delta < 0;
-
-			delta = negative ? -delta : delta;
-			delta = calc_delta_mine(delta,
-				new_cfsrq->load.weight, &old_cfsrq->load);
-			delta = negative ? -delta : delta;
-		}
-		p->se.vruntime = new_cfsrq->min_vruntime + delta;
-	}
+	p->se.vruntime -= old_cfsrq->min_vruntime -
+					 new_cfsrq->min_vruntime;
 
 	__set_task_cpu(p, new_cpu);
 }
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 500ed14..761071d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -283,10 +283,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 						   struct sched_entity,
 						   run_node);
 
-		if (vruntime == cfs_rq->min_vruntime)
-			vruntime = se->vruntime;
-		else
-			vruntime = min_vruntime(vruntime, se->vruntime);
+		vruntime = min_vruntime(vruntime, se->vruntime);
 	}
 
 	cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
@@ -677,9 +674,13 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 			unsigned long thresh = sysctl_sched_latency;
 
 			/*
-			 * convert the sleeper threshold into virtual time
+			 * Convert the sleeper threshold into virtual time.
+			 * SCHED_IDLE is a special sub-class.  We care about
+			 * fairness only relative to other SCHED_IDLE tasks,
+			 * all of which have the same weight.
 			 */
-			if (sched_feat(NORMALIZED_SLEEPER))
+			if (sched_feat(NORMALIZED_SLEEPER) &&
+					task_of(se)->policy != SCHED_IDLE)
 				thresh = calc_delta_fair(thresh, se);
 
 			vruntime -= thresh;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/