[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0609071943410.19638@schroedinger.engr.sgi.com>
Date: Thu, 7 Sep 2006 19:52:22 -0700 (PDT)
From: Christoph Lameter <clameter@....com>
To: Nick Piggin <npiggin@...e.de>
cc: akpm@...l.org, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...e.hu>,
Suresh B <suresh.b.siddha@...el.com>
Subject: Re: [PATCH] Fix longstanding load balancing bug in the scheduler.
On Fri, 8 Sep 2006, Nick Piggin wrote:
> > Then the load will likely be sitting on a runqueue and run
> > much slower since it has to share cpus although many cpus are available
> > to take new jobs. The scanning is very fast and it is certainly better
> > than continuing to overload a single processor.
>
> But it is N^2... I thought SGI of all would hesitate to put such an
> algorithm into the scheduler ;)
You definitely got that one right. But we would prefer to a working
scheduler over a broken one.
> > Could you tell us how this could work?
>
> You keep track of a fallback CPU. Then, if balancing fails because all
> processes are pinned, you just try pulling from the fallback CPU. If
> that fails, set the fallback to the next CPU.
Ok. As you note below this wont do too much good.
> OTOH that still results in suboptimal scheduler, and now that I remember
> your workload, you had a small number of CPUs screwing up balancing for
> a big system. Hmm... so this may be not great for you.
Right. Lets put this in as a temporary measure and then well try to get to
a long term solution that avoids doing N^2 searches in the kernel.
> I'd like to get an ack from Ingo, but otherwise OK I guess.
Ok. Lets see what others say.
> Hmm, how about
>
> 1. Export SD information in /sys/
> 2. Export runqueue load information there too
Ok. That would work.
> 3. One attribute in /sys/.../CPUn/ will be written to a CPU number m and
> a count t, which will try to pull t tasks from CPUm to CPUn
Too contorted and too difficult to use. Having a cpu field in
/proc/<pid>/cpu is self explanatory, easy to understand and allows
per process control for the user space mechanism. Could also be used
to test things or to manually intervene in the scheduler.
Transferring an abstract number of processes from one cpu to another is
rare. One strives to have one process per cpu.
> 4. Another attribute would be the result of the last balance attempt
> (eg. 'all pinned').
Ok.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists