linux-kernel - Re: [patch] CFS scheduler, v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070421075401.GA23410@elte.hu>
Date:	Sat, 21 Apr 2007 09:54:01 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Peter Williams <pwil3058@...pond.net.au>
Cc:	William Lee Irwin III <wli@...omorphy.com>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Con Kolivas <kernel@...ivas.org>,
	Nick Piggin <npiggin@...e.de>, Mike Galbraith <efault@....de>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>, caglar@...dus.org.tr,
	Willy Tarreau <w@....eu>, Gene Heskett <gene.heskett@...il.com>
Subject: Re: [patch] CFS scheduler, v3

* Peter Williams <pwil3058@...pond.net.au> wrote:

> I retract this suggestion as it's a very bad idea.  It introduces the 
> possibility of starvation via the poor sods at the bottom of the queue 
> having their "on CPU" forever postponed and we all know that even the 
> smallest possibility of starvation will eventually cause problems.
> 
> I think there should be a rule: Once a task is on the queue its "on 
> CPU" time is immutable.

Yeah, fully agreed. Currently i'm using the simple method of 
p->nice_offset, which plainly just moves the per nice level areas of the 
tree far enough apart (by a constant offset) so that lower nice levels 
rarely interact with higher nice levels. Lower nice levels never truly 
starve because rq->fair_clock increases deterministically and currently 
the fair_key values are indeed 'immutable' as you suggest.

In practice they can starve a bit when one renices thousands of tasks, 
so i was thinking about the following special-case: to at least make 
them easily killable: if a nice 0 task sends a SIGKILL to a nice 19 task 
then we could 'share' its p->wait_runtime with that nice 19 task and 
copy the signal sender's nice_offset. This would in essence pass the 
right to execute over to the killed task, so that it can tear itself 
down.

This cannot be used to gain an 'unfair advantage' because the signal 
sender spends its own 'right to execute on the CPU', and because the 
target task cannot execute any user code anymore when it gets a SIGKILL.

In any case, it is clear that rq->raw_cpu_load should be used instead of 
rq->nr_running, when calculating the fair clock, but i begin to like the 
nice_offset solution too in addition of this: it's effective in practice 
and starvation-free in theory, and most importantly, it's very simple. 
We could even make the nice offset granularity tunable, just in case 
anyone wants to weaken (or strengthen) the effectivity of nice levels. 
What do you think, can you see any obvious (or less obvious) 
showstoppers with this approach?

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/