lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070919202632.GA30474@elte.hu>
Date:	Wed, 19 Sep 2007 22:26:32 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Chuck Ebbert <cebbert@...hat.com>,
	Antoine Martin <antoine@...afix.co.uk>,
	Satyam Sharma <satyam.sharma@...il.com>,
	Linux Kernel Development <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: CFS: some bad numbers with Java/database threading [FIXED]


* Ingo Molnar <mingo@...e.hu> wrote:

> ok, we can do that.
> 
> the O(1) implementation of yield() was pretty arbitrary: it did not 
> move it last on the same priority level - it only did it within the 
> active array. So expired tasks (such as CPU hogs) would come _after_ a 
> yield()-ing task.
> 
> so the yield() implementation was so much tied to the data structures 
> of the O(1) scheduler that it was impossible to fully emulate it in 
> CFS.
> 
> in CFS we dont have a per-nice-level rbtree, so we cannot move it dead 
> last within the same priority group - but we can move it dead last in 
> the whole tree. (then they'd be put even after nice +19 tasks.) People 
> might complain about _that_.
> 
> another practical problem is that this will break certain desktop apps 
> that do calls to yield() [some firefox plugins do that, some 3D apps 
> do that, etc.] but they dont expect to be moved 'very late' into the 
> queue - they expect the O(1) scheduler's behavior of being delayed "a 
> bit". (That's why i added the yield-granularity tunable.)
> 
> we can make yield super-agressive, that is pretty much the only sane 
> (because well-defined) thing to do (besides turning yield into a NOP), 
> but there will be lots of regression reports about lost interactivity 
> during load. sched_yield() is a mortally broken API. "fix the app" 
> would be the answer, but still there will be lots of complaints.

find below the fix that puts yielding tasks to the rightmost position in 
the rbtree. I have not tested it extensively yet, but it appears to work 
on x86. (i tested yield using interactive tasks and they get hurt now 
under load - but this would be expected.)

	Ingo

---------------------->
Subject: sched: make yield more agressive
From: Ingo Molnar <mingo@...e.hu>

make sys_sched_yield() more agressive, by moving the yielding task
to the last position in the rbtree.

Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 kernel/sched.c      |    5 +----
 kernel/sched_fair.c |   39 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 35 insertions(+), 9 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4550,10 +4550,7 @@ asmlinkage long sys_sched_yield(void)
 	struct rq *rq = this_rq_lock();
 
 	schedstat_inc(rq, yld_cnt);
-	if (unlikely(rq->nr_running == 1))
-		schedstat_inc(rq, yld_act_empty);
-	else
-		current->sched_class->yield_task(rq, current);
+	current->sched_class->yield_task(rq, current);
 
 	/*
 	 * Since we are going to call schedule() anyway, there's
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -902,14 +902,43 @@ static void dequeue_task_fair(struct rq 
 static void yield_task_fair(struct rq *rq, struct task_struct *p)
 {
 	struct cfs_rq *cfs_rq = task_cfs_rq(p);
+	struct rb_node **link = &cfs_rq->tasks_timeline.rb_node;
+	struct sched_entity *rightmost, *se = &p->se;
+	struct rb_node *parent;
 
-	__update_rq_clock(rq);
 	/*
-	 * Dequeue and enqueue the task to update its
-	 * position within the tree:
+	 * Are we the only task in the tree?
 	 */
-	dequeue_entity(cfs_rq, &p->se, 0);
-	enqueue_entity(cfs_rq, &p->se, 0);
+	if (unlikely(cfs_rq->nr_running == 1))
+		return;
+	/*
+	 * Find the rightmost entry in the rbtree:
+	 */
+	do {
+		parent = *link;
+		link = &parent->rb_right;
+	} while (*link);
+
+	rightmost = rb_entry(parent, struct sched_entity, run_node);
+	/*
+	 * Already in the rightmost position?
+	 */
+	if (unlikely(rightmost == se))
+		return;
+
+	/*
+	 * Minimally necessary key value to be last in the tree:
+	 */
+	se->fair_key = rightmost->fair_key + 1;
+
+	if (cfs_rq->rb_leftmost == &se->run_node)
+		cfs_rq->rb_leftmost = rb_next(&se->run_node);
+	/*
+	 * Relink the task to the rightmost position:
+	 */
+	rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
+	rb_link_node(&se->run_node, parent, link);
+	rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
 }
 
 /*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ