linux-kernel - Re: [PATCH?] Livelock in pick_next_task_fair() / idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150703093441.GA15477@e105550-lin.cambridge.arm.com>
Date:	Fri, 3 Jul 2015 10:34:41 +0100
From:	Morten Rasmussen <morten.rasmussen@....com>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	Mike Galbraith <umgwanakikbuti@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Rabin Vincent <rabin.vincent@...s.com>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>
Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()

On Fri, Jul 03, 2015 at 03:37:02AM +0800, Yuyang Du wrote:
> Hi Morten,
> 
> On Thu, Jul 02, 2015 at 12:40:32PM +0100, Morten Rasmussen wrote:
> > detach_tasks() will attempts to pull 62 based on tasks task_h_load() but
> > the task_h_load() sum is only 5 + 10 + 0 and hence detach_tasks() will
> > empty the src_rq.
> > 
> > IOW, since task groups include blocked load in the load_avg_contrib (see
> > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the
> > imbalance includes blocked load and hence env->imbalance >=
> > sum(task_h_load(p)) for all tasks p on the rq. Which leads to
> > detach_tasks() emptying the rq completely in the reported scenario where
> > blocked load > runnable load.
> 
> Whenever I want to know the load avg concerning task group, I need to
> walk through the complete codes again, I prefer not to do it this time.
> But it should not be that simply to say "the 118 comes from the blocked load".

But the whole hierarchy of group entities is updated each time we enqueue
or dequeue a task. I don't see how the group entity load_avg_contrib is
not up to date? Why do you need to update it again?

In any case, we have one task in the group hierarchy which has a
load_avg_contrib of 0 and the grand-grand parent group entity has a
load_avg_contrib of 118 and no additional tasks. That load contribution
must be from tasks which are no longer around on the rq? No?

> Anyway, with blocked load, yes, we definitely can't move (or even find) some
> ammount of the imbalance if we only look at the tasks on the queue. But this
> may or may not be a problem.
> 
> Firstly, the question comes to whether we want blocked load anywhere.
> This is just about a "now vs. average" question.

That is what I meant in the paragraph below. It is a scheduling policy
question.

> Secondly, if we stick to average, we just need to treat the blocked load
> consistently, not that group SE has it, but task SE does not, or somewhere
> has it, others not.

I agree that inconsistent use of blocked load will lead us into trouble.
The problem is that none of the load-balance logic was designed for
blocked load. It was written to deal load that is currently on the rq
and slightly biased by average cpu load, not dealing with load
contribution of tasks which we can't migrate at the moment because they
are blocked. The load-balance code has to be updated to deal with
blocked load. We will run into all sorts of issues if we don't and roll
out use of blocked load everywhere.

However, before we can rework the load-balance code, we have to agree on
the now vs average balance policy.

Your proposed patch implements a policy somewhere in between. We try to
balance based on average, but we don't allow idle_balance() to empty a
cpu completely. A pure average balance policy would allow a cpu to go
idle even if we could do have tasks waiting on other rqs if the blocked
load indicates that other task will show up shortly one the cpu. A pure
"now" balance would balance based on runnable_load_avg for all entities
including groups ignoring all blocked load, but that goes against the
PELT group balancing design.

I'm not against having a policy that sits somewhere in between, we just
have to agree it is the right policy and clean up the load-balance code
such that the implemented policy is clear.

Morten

>  
> Thanks,
> Yuyang
> 
> > Whether emptying the src_rq is the right thing to do depends on on your
> > point of view. Does balanced load (runnable+blocked) take priority over
> > keeping cpus busy or not? For idle_balance() it seems intuitively
> > correct to not empty the rq and hence you could consider env->imbalance
> > to be too big.
> > 
> > I think we will see more of this kind of problems if we include
> > weighted_cpuload() as well. Parts of the imbalance calculation code is
> > quite old and could use some attention first.
> > 
> > A short term fix could be what Yuyang propose, stop pulling tasks when
> > there is only one left in detach_tasks(). It won't affect active load
> > balance where we may want to migrate the last task as it active load
> > balance doesn't use detach_tasks().
> > 
> > Morten
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/