linux-kernel - Re: [PATCH] default to n for GROUP_SCHED and FAIR_GROUP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 5 May 2008 23:05:26 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Arjan van de Ven <arjan@...radead.org>,
	Sam Ravnborg <sam@...nborg.org>,
	Parag Warudkar <parag.warudkar@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	"akpm@...l.org" <akpm@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dave Jones <davej@...hat.com>
Subject: Re: [PATCH] default to n for GROUP_SCHED and FAIR_GROUP_SCHED

* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> Another example of that kind of behaviour, for example, is just you 
> fighting turning off 'default y' from FAIR_GROUP_SCHED, considering 
> that it is known to cause latency problems and the reason isn't 
> understood.

a side-note to this topic: after looking at a bunch of traces and after 
a lot of testing, the latency problems are complex, but reasonably 
well-understood.

Nevertheless we'll mark it default-disabled because it's been taking too 
long to create and propagate the fixes. I've queued up a patch for that. 
We might even mark it BROKEN for a single release so that the option 
disappears from people's config? Or we could change the name to achieve 
a similar effect.

The main design-level latency source was due to the hierarchic view of 
group scheduling - we had a hierarchy of runqueues. CFS met the latency 
targets, but only per level (per runqueue) of the hierarchy. So with 
every new level, we got more maximum latency.

So for example on a system with fair user scheduling, it takes just a 
couple of different UIDs to be probabilistically active at once to 
generate a bad latency: say if root, nobody, distcc and mingo UIDs are 
are active at once, the mingo task could see a 4x latency hit over the 
target - 160 msecs instead of 40 msecs.

This is now believed to be fixed in sched-devel.git, via the "single 
runqueue" and deadline-scheduling patches from Peter that flattens the 
hierarchy of the group scheduler.

Another latency source was the skew of sched_clock() running too slow - 
that way if the clock runs at 10% of its intended speed the scheduler 
will turn a 40msec intended latency target into a 400 msec latency 
target!

This bug too is now believed to be fixed via Peter's new sched_clock 
code in sched-devel.git.

... and users now have a very objective stick they can use on us: 
latencytop. It told us black and white when we sucked. (I am waiting for 
the days when it will auto-create a scheduler trace for the worst 
latency hit in the system, making it easy for users to submit traces.)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/