linux-kernel - Re: [patch] CFS (Completely Fair Scheduler), v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200704170151.09244.gene.heskett@gmail.com>
Date:	Tue, 17 Apr 2007 01:51:08 -0400
From:	Gene Heskett <gene.heskett@...il.com>
To:	Willy Tarreau <w@....eu>
Cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Con Kolivas <kernel@...ivas.org>,
	Nick Piggin <npiggin@...e.de>, Mike Galbraith <efault@....de>,
	Arjan van de Ven <arjan@...radead.org>,
	Peter Williams <pwil3058@...pond.net.au>,
	Thomas Gleixner <tglx@...utronix.de>, caglar@...dus.org.tr,
	Dmitry Adamushko <dmitry.adamushko@...il.com>
Subject: Re: [patch] CFS (Completely Fair Scheduler), v2

On Tuesday 17 April 2007, Willy Tarreau wrote:
>Hi Gene,
>
>On Tue, Apr 17, 2007 at 12:53:56AM -0400, Gene Heskett wrote:
>> On Monday 16 April 2007, Ingo Molnar wrote:
>> >this is the second release of the CFS (Completely Fair Scheduler)
>> >patchset, against v2.6.21-rc7:
>> >
>> >   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch
>> >
>> >i'd like to thank everyone for the tremendous amount of feedback and
>> >testing the v1 patch got - i could hardly keep up with just reading the
>> >mails! Some of the stuff people addressed i couldnt implement yet, i
>> >mostly concentrated on bugs, regressions and debuggability.
>> >
>> >there's a fair amount of churn:
>> >
>> >   15 files changed, 456 insertions(+), 241 deletions(-)
>> >
>> >But it's an encouraging sign that there was no crash bug found in v1,
>> >all the bugs were related to scheduling-behavior details. The code was
>> >tested on 3 architectures so far: i686, x86_64 and ia64. Most of the
>> >code size increase in -v2 is due to debugging helpers, they'll be
>> >removed later. (The new /proc/sched_debug file can be used to see the
>> >fine details of CFS scheduling.)
>> >
>> >Changes since -v1:
>> >
>> > - make nice levels less starvable. (reported by Willy Tarreau)
>> >
>> > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first
>> >   flag can be used to turn it on/off. (This might fix the Kaffeine bug
>> >   reported by S.Ça??lar Onur <)
>> >
>> > - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas)
>> >
>> > - UP build fix. (reported by Gabriel C)
>> >
>> > - timer tick micro-optimization (Dmitry Adamushko)
>> >
>> > - preemption fix: sched_class->check_preempt_curr method to decide
>> >   whether to preempt after a wakeup (or at a timer tick). (Found via a
>> >   fairness-test-utility written for CFS by Mike Galbraith)
>> >
>> > - start forked children with neutral statistics instead of trying to
>> >   inherit them from the parent: Willy Tarreau reported that this
>> >   results in better behavior on extreme workloads, and it also
>> >   simplifies the code quite nicely. Removed sched_exit() and the
>> >   ->task_exit() methods.
>> >
>> > - make nice levels independent of the sched_granularity value
>> >
>> > - new /proc/sched_debug file listing runqueue details and the rbtree
>> >
>> > - new SCH-* fields in /proc/<NR>/status to see scheduling details
>> >
>> > - new cpu-hog feature (off by default) and sysctl tunable to set it:
>> >   /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to
>> >   0 (off). Positive values are meant the maximum 'memory' that the
>> >   scheduler has of CPU hogs.
>> >
>> > - various code cleanups
>> >
>> > - added more statistics temporarily: sum_exec_runtime,
>> >   sum_wait_runtime.
>> >
>> > - added -CFS-v2 to EXTRAVERSION
>> >
>> >as usual, any sort of feedback, bugreports, fixes and suggestions are
>> >more than welcome,
>> >
>> >	Ingo
>>
>> This one (v2-rc2) is not a keeper I'm sorry to say, Ingo.  v2-rc0 was much
>> better.  Watching amanda run with htop, kmails composer is being subjected
>> to 5 to 10 second pauses, and htop says that gzip -best isn't getting more
>> that 15% of the cpu, and the /amandatapes drive is being written to in a
>> regular pattern that seems to be the cause of the pauses  according to
>> gkrellm, which also seems to track the size of the writes, and can show
>> anything from 4.3k to 54 megs as being written in one cycle of its screen
>> update.

Somewhat interesting to this, I have amanda doing a verify phase too.  During 
the verify phase (and while I was waiting for gmail to transmit this message, 
it took 30 minutes before it showed up on the list) I noted that when 
amrestore fired up, it, and its child tar were only taking about 20% of the 
cpu between them, and that /dev/hdd was showing a pretty steady 55 to 
75MB/sec being read.  As to what this tells us, I'm not going to hazard a 
guess because it wouldn't, this time of the night here in WV, USA, even be a 
SWAG.  Its coming up on 2am and the toothpicks holding my eyes open are 
sagging badly, making creaking noises even.

>Have you tried previous version with the fair-fork patch ? It might be
> possible that your workload is sensible to the fork()'s child getting much
> CPU upon startup.

Willy, I think that patch went by, and was followed by the v2-rc2 so fast that 
I never got a chance to try it with the v2-rc0 framework.  So I believe the 
answer there is probably no.  I never saw a problem with the v2-rc0, but Ingo 
shot me a message about it without enough detail that I could have tested for 
it.

FWIW, I've been using the CFQ I/O scheduler for quite a while, is it time I 
gave the AS or Deadline versions another check?  They are all built in but I 
don't know how to change the default on the fly, or even if it can be done.

>Ingo, maybe I'm saying something stupid, but in my userland scheduler, when
>new tasks are "forked", they are queued at the end of the run queue with a
>fixed priority. In our case, this would translate into assigning them the
>same prio and timeslice as their parent, but queuing them at the end so that
>they don't make existing tasks starve during huge fork() loads.
>
>I don't know how that would be possible (nor if that would help in
> anything), but I found it was a good compromise over sharing the timeslice
> with the parent. Perhaps we should have some absolute timeslice and some
> relative timeslice (eg: X percent of total time divided by the number of
> tasks) ?
>
>Regards,
>Willy

Thanks Willy.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"I take Him shopping with me. I say, 'OK, Jesus, help me find a bargain'" 
--Tammy Faye Bakker
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/