linux-kernel - Re: [PATCH] Fix longstanding load balancing bug in the scheduler.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0609071016250.16674@schroedinger.engr.sgi.com>
Date:	Thu, 7 Sep 2006 10:24:26 -0700 (PDT)
From:	Christoph Lameter <clameter@....com>
To:	Nick Piggin <npiggin@...e.de>
cc:	akpm@...l.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Fix longstanding load balancing bug in the scheduler.

Hmmm... Some more comments

On Thu, 7 Sep 2006, Nick Piggin wrote:

> So what I worry about with this approach is that it can really blow
> out the latency of a balancing operation. Say you have N-1 CPUs with
> lots of stuff locked on their runqueues.

Right but that situation is rare and the performance is certainly
better if the unpinned processes are not all running on the same 
processor.

> The solution I envisage is to do a "rotor" approach. For example
> the last attempted CPU could be stored in the starving CPU's sd...
> and it will subsequently try another one.

That wont work since the notion of "pinned" is relative to a cpu.
A process may be pinned to a group of processors. It will only appear to 
be pinned for cpus outside that set of processors!

What good does storing a processor number do if the processes can change 
dynamically and so may the pinning of processors. We are dealing with
a set of processes. Each of those may be pinned to a set of processors.

> I've been hot and cold on such an implementation for a while: on one
> hand it is a real problem we have; OTOH I was hoping that the domain
> balancing might be better generalised. But I increasingly don't
> think we should let perfect stand in the way of good... ;)

I think we should fix the problem ASAP. Lets do this fix and then we can 
think about other solutions. You already had lots of time to think about
the rotor.

This looks to me as a design flaw that would require either a major rework 
of the scheduler or (my favorite) a delegation of difficult (and 
computational complex and expensive) scheduler decisions to user space.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/