linux-kernel - Re: is RSDL an "unfair" scheduler too?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45FCB0A4.8070205@tmr.com>
Date:	Sat, 17 Mar 2007 22:23:16 -0500
From:	Bill Davidsen <davidsen@....com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Con Kolivas <kernel@...ivas.org>,
	Serge Belyshev <belyshev@...ni.sinp.msu.ru>,
	Al Boldi <a1426z@...ab.com>, Mike Galbraith <efault@....de>,
	linux-kernel@...r.kernel.org, Nicholas Miell <nmiell@...cast.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>, vishpat@...il.com
Subject: Re: is RSDL an "unfair" scheduler too?

Ingo Molnar wrote:

> * Con Kolivas <kernel@...ivas.org> wrote:
> 
>> Ok but please look at how it appears from my end (illness aside).
> 
> ( i really think we should continue this debate after you get better. 
>   Everything looks much darker when you are ill! )
> 
>> You initially said you were pleased with this design.
> 
> I said that 2 years ago about the staircase scheduler and i am still 
> saying this about RSDL today. That doesnt make my position automatically 
> correct though :-) For example i wrote and maintained the 4g:4g patchset 
> for over 2 years and still that was no guarantee of it making sense 
> upstream ;) And it was a hell of a lot of work (much uglier and nastier 
> work than any scheduler hacking and tuning, believe me), and it was 
> thrown away as a cute but unnecessary complication we dont need. So 
> what? To me what matters is the path you walk, not the destination you 
> reach.
> 
> in terms of RSDL design, i like it, still i'm kind of asking myself 
> 'couldnt something in this direction be done in a much simpler and less 
> revolutionary way'? For example couldnt we introduce per-priority level 
> timeslice quotas in the current scheme as well, instead of the very 
> simplistic and crude STARVATION_LIMIT approach? Furthermore, couldnt we 
> make the timeslices become smaller as the runqueue length increases, to 
> make starvation less of a problem? It seems like the problem cases with 
> the current scheduler arent so much centered around the interactivity 
> estimator, it is more that timeslices get distributed too coarsely, 
> while RSDL distributes timeslices in a more finegrained way and is thus 
> less suspect to starvation under certain workloads.

Yes. The "doorknob scheduler" was a scheduler which worked as follows: 
the runnable processes in the system were put in a priority sorted list 
and counted. Then the length of one cycle (turn) was divided by the 
number of processes and that was the timeslice. In the next version 
upper and lower limits were put on the length of a timeslice, so the 
system didn't get eaten by context switches under load or be jerky under 
light load. That worked fairly well.

Then people got into creating unfairness to address what they thought 
were corner cases, the code turned into a plumber's nightmare, and 
occasional jackpot cases created occasional (non-reproducible) hangs.

Finally management noted that the peripherals cost three times as much 
as the CPU, so jobs doing i/o should run first. That made batch run like 
the clappers of hell, and actually didn't do all the bad things you 
might expect. User input was waitio, disk was waitio, response was about 
as good as it could be for that hardware.

===> it might be useful to give high priority to a process going from 
waitio to runable state, once only.

The "doorknob" term came from "everybody gets a turn" and the year was 1970.

Avi Kivity wrote:
> Well, the heuristic here is that process == job. I'm not sure
> heuristic is the right name for it, but it does point out a
> deficieny.
 >
> A cpu-bound process with many threads will overwhelm a cpu-bound
> single threaded threaded process.
 >
 > A job with many processes will overwhelm a job with a single process.
 >
 > A user with many jobs can starve a user with a single job.
 >
> I don't think the problem here is heuristics, rather that the 
> scheduler's manages cpu quotas at the task level rather than at the
> user visible level. If scheduling were managed at all three
> hierarchies I mentioned ('job' is a bit artificial, but process and
> user are not) then:
 >
> - if N users are contending for the cpu on a multiuser machine, each 
> should get just 1/N of available cpu power. As it is, a user can run
> a few of your #1 workloads (or a make -j 20) and slow every other
> user down - your example would work perfectly (if we can communicate
> to the kernel what a job is)
 >
 > - multi-threaded processes would not get an unfair advantage

If we wanted to do this, a job would be defined as all children or 
threads of the oldest parent process with a PPID of one. So if I logged 
on  and did
   make -j4
on a kernel, and someone else did:
   find /var -type f | xargs grep -l zumblegarfe
and someone else was doing:
   foo & mumble & barfe

We would all be equal. That's good! And there would be some recursive 
scheduler which would pick a "job" and then a process, and run it. That 
too is good!

But we have a mail server, and there are 671 threads with a socket and 
POP3 user on each one, and they only get 1/N of a job worth of CPU 
between them, and that sucks rocks off the bottom of the ocean. So 
pretty soon the code gets some "fixes" to make POP3 work, and X work, 
and the code once again becomes plumber's nightmare... Then you start 
playing with process groups to address some of this, and address exec() 
corner cases, and complexity goes way up again. This is NOT an easy problem!

I think Con has done a good job, I think in most cases (heresy) the 
current mainline scheduler does a pretty good job. I'm in favor of 
having several schedulers, because I don't believe any one can match the 
behavior goals of all users.

-- 
Bill Davidsen <davidsen@....com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/