linux-kernel - Re: [PATCH 0/6] [RFC] Large weight differential leads to inefficient load balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1280917121.1923.931.camel@laptop>
Date:	Wed, 04 Aug 2010 12:18:41 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Nikhil Rao <ncrao@...gle.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Mike Galbraith <efault@....de>,
	linux-kernel@...r.kernel.org,
	Venkatesh Pallipadi <venki@...gle.com>,
	Ken Chen <kenchen@...gle.com>, Paul Turner <pjt@...gle.com>
Subject: Re: [PATCH 0/6] [RFC] Large weight differential leads to
 inefficient  load balancing

On Tue, 2010-08-03 at 14:28 -0700, Nikhil Rao wrote:

> I see your point here, and yes I agree having 1 nice-0 on one cpu, 512
> SCHED_IDLE tasks on another cpu and all other cpus idle is correct if
> we only considered fairness. However, we would also like to maximize
> machine utilization. The fitness function we would ideally like to
> optimize for is a combination of both fairness and utilization.

Sure, I see (and agree with) the fact that we want to optimize
utilization as well (although I bet the power management people might
feel otherwise :-)

> Thanks for your suggestions; I explored the first one a bit and I
> added a check into find_busiest_queue() (instead of
> find_busiest_group()) to skip a cpu if it has only 1 task on it (patch
> attached below - did you have something else in mind?). 

You might also need some changes to find_busiest_group(), suppose you
have a 4 cpu machine, with 2 groups of 2, now also assume you have 4
tasks, 2 of nice-0 and 2 idle, if both nice-0 are in the same group,
each on their own cpu, then f_b_g() could select that group as being the
busiest (its got W=2048, against W=4 of the other group after all).

Once you have that group, f_b_q() won't be able to do anything sensible.

> This fixes the
> example I posted in the RFC, but it doesn't work as well when the
> SCHED_NORMAL tasks have a sleep/wakeup pattern. I have some data below
> where the load balancer fails to fully utilize a machine. In these
> examples, I ran with the upstream kernel and with a kernel compiled
> with the check in fbq().

Right, so wakeup/sleep are indeed more interesting. For wakeup we also
have select_task_rq() to consider, it is responsible to choosing where
to run the newly woken task.

For sleeps we have new idle balancing, which is a lot like the regular
load-balancing but differs enough to need looking at.

>>From the data you provided I cannot tell you which of these two is
responsible for the thing you see (although under-utilization suggests
the new-idle balancer), you can use perf/ftrace to look at what your
tasks are doing and how they could be doing it better (Arjan's timechart
might be a good help).

If they get woken to the wrong CPU, its select_task_rq(), if they leave
a CPU idle too long, its new idle balancing -- or possibly its something
I overlooked all together :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/