[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45FCB0A4.8070205@tmr.com>
Date: Sat, 17 Mar 2007 22:23:16 -0500
From: Bill Davidsen <davidsen@....com>
To: Ingo Molnar <mingo@...e.hu>
CC: Con Kolivas <kernel@...ivas.org>,
Serge Belyshev <belyshev@...ni.sinp.msu.ru>,
Al Boldi <a1426z@...ab.com>, Mike Galbraith <efault@....de>,
linux-kernel@...r.kernel.org, Nicholas Miell <nmiell@...cast.net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>, vishpat@...il.com
Subject: Re: is RSDL an "unfair" scheduler too?
Ingo Molnar wrote:
> * Con Kolivas <kernel@...ivas.org> wrote:
>
>> Ok but please look at how it appears from my end (illness aside).
>
> ( i really think we should continue this debate after you get better.
> Everything looks much darker when you are ill! )
>
>> You initially said you were pleased with this design.
>
> I said that 2 years ago about the staircase scheduler and i am still
> saying this about RSDL today. That doesnt make my position automatically
> correct though :-) For example i wrote and maintained the 4g:4g patchset
> for over 2 years and still that was no guarantee of it making sense
> upstream ;) And it was a hell of a lot of work (much uglier and nastier
> work than any scheduler hacking and tuning, believe me), and it was
> thrown away as a cute but unnecessary complication we dont need. So
> what? To me what matters is the path you walk, not the destination you
> reach.
>
> in terms of RSDL design, i like it, still i'm kind of asking myself
> 'couldnt something in this direction be done in a much simpler and less
> revolutionary way'? For example couldnt we introduce per-priority level
> timeslice quotas in the current scheme as well, instead of the very
> simplistic and crude STARVATION_LIMIT approach? Furthermore, couldnt we
> make the timeslices become smaller as the runqueue length increases, to
> make starvation less of a problem? It seems like the problem cases with
> the current scheduler arent so much centered around the interactivity
> estimator, it is more that timeslices get distributed too coarsely,
> while RSDL distributes timeslices in a more finegrained way and is thus
> less suspect to starvation under certain workloads.
Yes. The "doorknob scheduler" was a scheduler which worked as follows:
the runnable processes in the system were put in a priority sorted list
and counted. Then the length of one cycle (turn) was divided by the
number of processes and that was the timeslice. In the next version
upper and lower limits were put on the length of a timeslice, so the
system didn't get eaten by context switches under load or be jerky under
light load. That worked fairly well.
Then people got into creating unfairness to address what they thought
were corner cases, the code turned into a plumber's nightmare, and
occasional jackpot cases created occasional (non-reproducible) hangs.
Finally management noted that the peripherals cost three times as much
as the CPU, so jobs doing i/o should run first. That made batch run like
the clappers of hell, and actually didn't do all the bad things you
might expect. User input was waitio, disk was waitio, response was about
as good as it could be for that hardware.
===> it might be useful to give high priority to a process going from
waitio to runable state, once only.
The "doorknob" term came from "everybody gets a turn" and the year was 1970.
Avi Kivity wrote:
> Well, the heuristic here is that process == job. I'm not sure
> heuristic is the right name for it, but it does point out a
> deficieny.
>
> A cpu-bound process with many threads will overwhelm a cpu-bound
> single threaded threaded process.
>
> A job with many processes will overwhelm a job with a single process.
>
> A user with many jobs can starve a user with a single job.
>
> I don't think the problem here is heuristics, rather that the
> scheduler's manages cpu quotas at the task level rather than at the
> user visible level. If scheduling were managed at all three
> hierarchies I mentioned ('job' is a bit artificial, but process and
> user are not) then:
>
> - if N users are contending for the cpu on a multiuser machine, each
> should get just 1/N of available cpu power. As it is, a user can run
> a few of your #1 workloads (or a make -j 20) and slow every other
> user down - your example would work perfectly (if we can communicate
> to the kernel what a job is)
>
> - multi-threaded processes would not get an unfair advantage
If we wanted to do this, a job would be defined as all children or
threads of the oldest parent process with a PPID of one. So if I logged
on and did
make -j4
on a kernel, and someone else did:
find /var -type f | xargs grep -l zumblegarfe
and someone else was doing:
foo & mumble & barfe
We would all be equal. That's good! And there would be some recursive
scheduler which would pick a "job" and then a process, and run it. That
too is good!
But we have a mail server, and there are 671 threads with a socket and
POP3 user on each one, and they only get 1/N of a job worth of CPU
between them, and that sucks rocks off the bottom of the ocean. So
pretty soon the code gets some "fixes" to make POP3 work, and X work,
and the code once again becomes plumber's nightmare... Then you start
playing with process groups to address some of this, and address exec()
corner cases, and complexity goes way up again. This is NOT an easy problem!
I think Con has done a good job, I think in most cases (heresy) the
current mainline scheduler does a pretty good job. I'm in favor of
having several schedulers, because I don't believe any one can match the
behavior goals of all users.
--
Bill Davidsen <davidsen@....com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists