[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150703163831.GQ3644@twins.programming.kicks-ass.net>
Date: Fri, 3 Jul 2015 18:38:31 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Morten Rasmussen <morten.rasmussen@....com>
Cc: Yuyang Du <yuyang.du@...el.com>,
Mike Galbraith <umgwanakikbuti@...il.com>,
Rabin Vincent <rabin.vincent@...s.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>
Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote:
> > > IOW, since task groups include blocked load in the load_avg_contrib (see
> > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the
> > > imbalance includes blocked load and hence env->imbalance >=
> > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to
> > > detach_tasks() emptying the rq completely in the reported scenario where
> > > blocked load > runnable load.
So IIRC we need the blocked load for groups for computing the per-cpu
slices of the total weight, avg works really well for that.
> I'm not against having a policy that sits somewhere in between, we just
> have to agree it is the right policy and clean up the load-balance code
> such that the implemented policy is clear.
Right, for balancing its a tricky question, but mixing them without
intent is, as you say, a bit of a mess.
So clearly blocked load doesn't make sense for (new)idle balancing. OTOH
it does make some sense for the regular periodic balancing, because
there we really do care mostly about the averages, esp. so when we're
overloaded -- but there are issues there too.
Now we can't track them both (or rather we could, but overhead).
I like Yuyang's load tracking rewrite, but it changes exactly this part,
and I'm not sure I understand the full ramifications of that yet.
One way out would be to split the load balancer into 3 distinct regions;
1) get a task on every CPU, screw everything else.
2) get each CPU fully utilized, still ignoring 'load'
3) when everybody is fully utilized, consider load.
If we make find_busiest_foo() select one of these 3, and make
calculate_imbalance() invariant to the metric passed in, and have things
like cpu_load() and task_load() return different, but coherent, numbers
depending on which region we're in, this almost sounds 'simple'.
The devil is in the details, and the balancer is a hairy nest of details
which will make the above non-trivial.
But for 1) we could simply 'balance' on nr_running, for 2) we can
'balance' on runnable_avg and for 3) we'll 'balance' on load_avg (which
will then include blocked load).
Let me go play outside for a bit so that it can sink in what kind of
nonsense my heat addled brain has just sprouted :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists