lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Sep 2018 14:35:51 +0200
From:   Vincent Guittot <>
To:     Peter Zijlstra <>
Subject: Re: [PATCH] sched/fair: fix load_balance redo for null imbalance

Le Friday 07 Sep 2018 à 13:37:49 (+0200), Peter Zijlstra a écrit :
> On Fri, Sep 07, 2018 at 09:51:04AM +0200, Vincent Guittot wrote:
> > It can happen that load_balance finds a busiest group and then a busiest rq
> > but the calculated imbalance is in fact null.
> Cute. Does that happen often?

I have a use case with RT tasks that reproduces the problem regularly.
It happens at least when we have CPUs with different capacity either because
of heterogeous CPU or because of RT/DL reducing available capacity for cfs
I have put the call path that trigs the problem below and accroding to the
comment it seems that we can reach similar state when playing with priority.

> > If the calculated imbalance is null, it's useless to try to find a busiest
> > rq as no task will be migrated and we can return immediately.
> > 
> > This situation can happen with heterogeneous system or smp system when RT
> > tasks are decreasing the capacity of some CPUs.
> Is it the result of one of those "force_balance" conditions in
> find_busiest_group() ? Should we not fix that to then return NULL
> instead?

The UC is:
We have a newly_idle load balance that is triggered when RT task becomes idle
( but I think that I have seen that with idle load balance too)

we trigs:
	if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
		goto force_balance;

In calculate_imbalance we use the path
	 * Avg load of busiest sg can be less and avg load of local sg can
	 * be greater than avg load across all sgs of sd because avg load
	 * factors in sg capacity and sgs with smaller group_type are
	 * skipped when updating the busiest sg:
	if (busiest->avg_load <= sds->avg_load ||
	    local->avg_load >= sds->avg_load) {
		env->imbalance = 0;
		return fix_small_imbalance(env, sds);

but fix_small_imbalance finally decides to return without modifying imbalance
like here
	if (busiest->avg_load + scaled_busy_load_per_task >=
	    local->avg_load + (scaled_busy_load_per_task * imbn)) {
		env->imbalance = busiest->load_per_task;

Beside this patch, I'm preparing another patch in fix small imbalance to
ensure 1 task per CPU in similar situation but according to the comment above,
we can reach this situation because of tasks priority

Powered by blists - more mailing lists