[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2b55d220704191720k761ffce2qa8cdb4cb58cc8c79@mail.gmail.com>
Date: Thu, 19 Apr 2007 17:20:53 -0700
From: "Michael K. Edwards" <medwards.linux@...il.com>
To: "Con Kolivas" <kernel@...ivas.org>
Cc: ray-gmail@...rabbit.org, "Ingo Molnar" <mingo@...e.hu>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"Nick Piggin" <npiggin@...e.de>,
"Linus Torvalds" <torvalds@...ux-foundation.org>,
"Matt Mackall" <mpm@...enic.com>,
"William Lee Irwin III" <wli@...omorphy.com>,
"Peter Williams" <pwil3058@...pond.net.au>,
"Mike Galbraith" <efault@....de>, "ck list" <ck@....kolivas.org>,
"Bill Huey" <billh@...ppy.monkey.org>,
linux-kernel@...r.kernel.org,
"Arjan van de Ven" <arjan@...radead.org>,
"Thomas Gleixner" <tglx@...utronix.de>
Subject: Re: Renice X for cpu schedulers
On 4/19/07, Con Kolivas <kernel@...ivas.org> wrote:
> The cpu scheduler core is a cpu bandwidth and latency
> proportionator and should be nothing more or less.
Not really. The CPU scheduler is (or ought to be) what electric
utilities call an economic dispatch mechanism -- a real-time
controller whose goal is to service competing demands cost-effectively
from a limited supply, without compromising system stability.
If you live in the 1960's, coal and nuclear (and a little bit of
fig-leaf hydro) are all you have, it takes you twelve hours to bring
plants on and off line, and there's no live operational control or
pricing signal between you and your customers. So you're stuck
running your system at projected peak + operating margin, dumping
excess power as waste heat most of the time, and browning or blacking
people out willy-nilly when there's excess demand. Maybe you get to
trade off shedding the loads with the worst transmission efficiency
against degrading the customers with the most tolerance for brownouts
(or the least regulatory clout). That's life without modern economic
dispatch.
If you live in 2007, natural gas and (outside the US) better control
over nuclear plants give you more ability to ramp supply up and down
with demand on something like a 15-minute cycle. Better yet, you can
store a little energy "in the grid" to smooth out instantaneous demand
fluctuations; if you're lucky, you also have enough fast-twitch hydro
(thanks, Canada!) that you can run your coal and lame-ass nuclear very
close to base load even when gas is expensive, and even pump water
back uphill when demand dips. (Coal is nasty stuff and a worse
contributor by far to radiation exposure than nuclear generation; but
on current trends it's going to last a lot longer than oil and gas,
and it's a lot easier to stockpile next to the generator.)
Best of all, you have industrial customers who will trade you live
control (within limits) over when and how much power they take in
return for a lower price per unit energy. Some of them will even dump
power back into the grid when you ask them to. So now the biggest
challenge in making supply and demand meet (in the short term) is to
damp all the different ways that a control feedback path might result
in an oscillation -- or in runaway pricing. Because there's always
some asshole greedhead who will gamble with system stability in order
to game the pricing mechanism. Lots of 'em, if you're in California
and your legislature is so dumb, or so bought, that they let the
asshole greedheads design the whole system so they can game it to the
max. (But that's a whole 'nother rant.)
Embedded systems are already in 2007, and the mainline Linux scheduler
frankly sucks on them, because it thinks it's back in the 1960's with
a fixed supply and captive demand, pissing away "CPU bandwidth" as
waste heat. Not to say it's an easy problem; even academics with a
dozen publications in this area don't seem to be able to model energy
usage to the nearest big O, let alone design a stable economic
dispatch engine. But it helps to acknowledge what the problem is:
even in a 1960's raised-floor screaming-air-conditioners
screw-the-power-bill machine room, you can't actually run a
half-decent CPU flat out any more without burning it to a crisp.
You can act ignorant and let the PMIC brown you out when it has to.
Or you can start coping in mainline the way that organizations big
enough (and smart enough) to feel the heat in their pocketbooks do in
their pet kernels. (Boo on Google for not sharing, and props to IBM
for doing their damnedest.) And guess what? The system will actually
get simpler, and stabler, and faster, and easier to maintain, because
it'll be based on a real theory of operation with equations and things
instead of a bunch of opaque, undocumented shotgun heuristics.
This hypothetical economic-dispatch scheduler will still _have_
heuristics, of course -- you can't begin to model a modern CPU
accurately on-line. But they will be contained in _data_ rather than
_code_, and issues of numerical stability will be separated cleanly
from the rule set. You'll be able to characterize the rule set's
domain of stability, given a conservative set of assumptions about the
feedback paths in the system under control, with the sort of
techniques they teach in the engineering schools that none of us (me
included) seem to have attended. (I went to school thinking I was
going to be a physicist. Wishful thinking -- but I was young and
stupid. What's your excuse? ;-)
OK, it feels better to have that off my chest. Apologies to those
readers -- doubtless the vast majority of LKML, including everyone
else in this thread -- for whom it's irrelevant, pseudo-learned
pontification with no patch attached. And my sincere thanks to Ingo,
Con, and really everyone else CC'ed, without whom Linux wouldn't be as
good as it is (really quite good, all things considered) and wouldn't
contribute as much as it does to my own livelihood.
Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists