linux-kernel - Re: [PATCH] Fix longstanding load balancing bug in the scheduler.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0609071511370.18416@schroedinger.engr.sgi.com>
Date:	Thu, 7 Sep 2006 15:20:32 -0700 (PDT)
From:	Christoph Lameter <clameter@....com>
To:	Nick Piggin <npiggin@...e.de>
cc:	akpm@...l.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Fix longstanding load balancing bug in the scheduler.

On Thu, 7 Sep 2006, Nick Piggin wrote:

> > Right but that situation is rare and the performance is certainly
> > better if the unpinned processes are not all running on the same 
> > processor.
> 
> But the situation is rare full stop otherwise we'd have had more
> complaints. And we do tend to care about theoretical max latency
> in many other obscure paths than the scheduler.

The situation is rare if you just have a couple of cpus and it also does 
not occur if you do not pin processors.

> How about if you have N/2 CPUs with lots of stuff on runqueues?
> Then the other N/2 will each be scanning N/2+1 runqueues... that's
> a lot of bus traffic going on which you probably don't want.

Then the load will likely be sitting on a runqueue and run 
much slower since it has to share cpus although many cpus are available 
to take new jobs. The scanning is very fast and it is certainly better 
than continuing to overload a single processor.

> > That wont work since the notion of "pinned" is relative to a cpu.
> > A process may be pinned to a group of processors. It will only appear to 
> > be pinned for cpus outside that set of processors!
> 
> A sched-domain is per-CPU as well. Why doesn't it work?

Lets say you store the latest set of pinned cpus encountered (I am not 
sure what cpu number would accomplish). The next time you have to 
reschedule the situation may be completely different and you would have
to revalidate the per cpu pinned cpu set. 

> > What good does storing a processor number do if the processes can change 
> > dynamically and so may the pinning of processors. We are dealing with
> > a set of processes. Each of those may be pinned to a set of processors.
> 
> Yes, it isn't going to be perfect, but it would at least work (which
> it doesn't now) without introducing that latency.

Could you tell us how this could work?

> > This looks to me as a design flaw that would require either a major rework 
> > of the scheduler or (my favorite) a delegation of difficult (and 
> > computational complex and expensive) scheduler decisions to user space.
> 
> I'd be really happy to move this to userspace :)

Can we merge this patch and then I will try to move this as fast as 
possible to userspace?

Ok then we need to do the following to address the current issue and to
open up the possibilities for future enhancements:

1. Add a new field to /proc/<pid>/cpu that can be written to and that
   forces the process to be moved to the corresponding cpu.

2. Extend statistics in /proc/<pid>/ to allow the export of more scheduler
   information and a special flag that is set if a process cannot be 
   rescheduling. initially the user space scheduler may just poll this 
  field.

3. Optional: Some sort of notification mechanism that a user space 
   scheduler helper could subscribe to. If a pinned tasks situation
   is encountered then the notification passes the number of the
   idle cpu.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/