linux-kernel - Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070417035528.GE25513@wotan.suse.de>
Date:	Tue, 17 Apr 2007 05:55:28 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	"Michael K. Edwards" <medwards.linux@...il.com>
Cc:	Peter Williams <pwil3058@...pond.net.au>,
	William Lee Irwin III <wli@...omorphy.com>,
	Ingo Molnar <mingo@...e.hu>, Matt Mackall <mpm@...enic.com>,
	Con Kolivas <kernel@...ivas.org>, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mike Galbraith <efault@....de>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

On Mon, Apr 16, 2007 at 04:10:59PM -0700, Michael K. Edwards wrote:
> On 4/16/07, Peter Williams <pwil3058@...pond.net.au> wrote:
> >Note that I talk of run queues
> >not CPUs as I think a shift to multiple CPUs per run queue may be a good
> >idea.
> 
> This observation of Peter's is the best thing to come out of this
> whole foofaraw.  Looking at what's happening in CPU-land, I think it's
> going to be necessary, within a couple of years, to replace the whole
> idea of "CPU scheduling" with "run queue scheduling" across a complex,
> possibly dynamic mix of CPU-ish resources.  Ergo, there's not much
> point in churning the mainline scheduler through a design that isn't
> significantly more flexible than any of those now under discussion.

Why? If you do that, then your load balancer just becomes less flexible
because it is harder to have tasks run on one or the other.

You can have single-runqueue-per-domain behaviour (or close to) just by
relaxing all restrictions on idle load balancing within that domain. It
is harder to go the other way and place any per-cpu affinity or
restirctions with multiple cpus on a single runqueue.


> For instance, there are architectures where several "CPUs"
> (instruction stream decoders feeding execution pipelines) share parts
> of a cache hierarchy ("chip-level multitasking").  On these machines,
> you may want to co-schedule a "real" processing task on one pipeline
> with a "cache warming" task on the other pipeline -- but only for
> tasks whose memory access patterns have been sufficiently analyzed to
> write the "cache warming" task code.  Some other tasks may want to
> idle the second pipeline so they can use the full cache-to-RAM
> bandwidth.  Yet other tasks may be genuinely CPU-intensive (or I/O
> bound but so context-heavy that it's not worth yielding the CPU during
> quick I/Os), and hence perfectly happy to run concurrently with an
> unrelated task on the other pipeline.

We can do all that now with load balancing, affinities or by shutting
down threads dynamically.


> There are other architectures where several "hardware threads" fight
> over parts of a cache hierarchy (sometimes bizarrely described as
> "sharing" the cache, kind of the way most two-year-olds "share" toys).
> On these machines, one instruction pipeline can't help the other
> along cache-wise, but it sure can hurt.  A scheduler designed, tested,
> and tuned principally on one of these architectures (hint:
> "hyperthreading") will probably leave a lot of performance on the
> floor on processors in the former category.
> 
> In the not-so-distant future, we're likely to see architectures with
> dynamically reconfigurable interconnect between instruction issue
> units and execution resources.  (This is already quite feasible on,
> say, Virtex4 FX devices with multiple PPC cores, or Altera FPGAs with
> as many Nios II cores as fit on the chip.)  Restoring task context may
> involve not just MMU swaps and FPU instructions (with state-dependent
> hidden costs) but processsor reconfiguration.  Achieving "fairness"
> according to any standard that a platform integrator cares about (let
> alone an end user) will require a fairly detailed model of the hidden
> costs associated with different sorts of task switch.
> 
> So if you are interested in schedulers for some reason other than a
> paycheck, let the distros worry about 5% improvements on x86[_64].
> Get hold of some different "hardware" -- say:
>  - a Xilinx ML410 if you've got $3K to blow and want to explore
> reconfigurable processors;
>  - a SunFire T2000 if you've got $11K and want to mess with a CMT
> system that's actually shipping;
>  - a QEMU-simulated massively SMP x86 if you're poor but clever
> enough to implement funky cross-core cache effects yourself; or
>  - a cycle-accurate simulator from Gaisler or Virtio if you want a
> real research project.
> Then go explore some more interesting regions of parameter space and
> see what the demands on mainline Linux will look like in a few years.

There are no doubt improvements to be made, but they are generally
intended to be able to be done within the sched-domains framework. I
am not aware of a particular need that would be impossible to do using
that topology hierarchy and per-CPU runqueues, and there are added
complications involved with multiple CPUs per runqueue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/