[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151214144645.GA23930@e105550-lin.cambridge.arm.com>
Date: Mon, 14 Dec 2015 14:46:46 +0000
From: Morten Rasmussen <morten.rasmussen@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Yuyang Du <yuyang.du@...el.com>,
Andrey Ryabinin <aryabinin@...tuozzo.com>, mingo@...hat.com,
linux-kernel@...r.kernel.org, Paul Turner <pjt@...gle.com>,
Ben Segall <bsegall@...gle.com>
Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems
On Mon, Dec 14, 2015 at 03:20:21PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 14, 2015 at 01:07:26PM +0000, Morten Rasmussen wrote:
>
> > Agreed, >100% is a transient state (which can be rather long) which only
> > means over-utilized, nothing more. Would you like the metric itself to
> > be changed to saturate at 100% or just cap it to 100% when used?
>
> We already cap it when using it IIRC. But no, I was thinking of the
> measure itself.
Yes, okay.
>
> > It is not straight forward to provide a bound on the sum.
>
> Agreed..
>
> > There isn't one for load_avg either.
>
> But that one is fundamentally unbound, whereas the util thing is
> fundamentally bound, except our implementation isn't.
Agreed.
>
> > If we want to guarantee an upper bound for
> > cfs_rq->avg.util_sum we have to somehow cap the se->avg.util_avg
> > contributions for each sched_entity. This cap depends on the cpu and how
> > many other tasks are associated with that cpu. The cap may have to
> > change when tasks migrate.
>
> Yep, blows :-)
>
> > > However, I think that makes sense, but would propose doing it
> > > differently. That condition is generally a maximum (assuming proper
> > > functioning of the weight based scheduling etc..) for any one task, so
> > > on migrate we can hard clip to this value.
>
> > Why use load.weight to scale util_avg? It is affected by priority. Isn't
> > just the ratio 1/nr_running that you are after?
>
> Remember, the util thing is based on running, so assuming each task
> always wants to run, each task gets to run w_i/\Sum_j w_j due to CFS
> being a weighted fair queueing thingy.
Of course, yes.
>
> > IIUC, you propose to clip the sum itself. In which case you are running
> > into trouble when removing tasks. You don't know how much to remove from
> > the clipped sum.
>
> Right, then we'll have to slowly gain it again.
If you have a seriously over-utilized cpu and migrate some of the tasks
to a different cpu the old cpu may temporarily look lightly utilized
even if we leave some big tasks behind. That might lead us to trouble if
we start using util_avg as the basis for cpufreq decisions. If we care
about performance, the safe choice is to consider an cpu over-utilized
still over-utilized even after we have migrated tasks away. We can only
trust that the cpu is no longer over-utilized when cfs_rq->avg.util_avg
'naturally' goes below 100%. So from that point of view, it might be
better to let it stay 100% and let it sort itself out.
> > Another problem is that load.weight is just a snapshot while
> > avg.util_avg includes tasks that are not currently on the rq so the
> > scaling factor is probably bigger than what you want.
>
> Our weight guestimates also include non running (aka blocked) tasks,
> right?
The rq/cfs_rq load.weight doesn't. It is updated through
update_load_{add,sub}() in account_entity_{enqueue,dequeue}(). So only
runnable+running tasks I think.
> > If we leave the sum as it is (unclipped) add/remove shouldn't give us
> > any problems. The only problem is the overflow, which is solved by using
> > a 64bit type for load_avg. That is not an acceptable solution?
>
> It might be. After all, any time any of this is needed we're CPU bound
> and the utilization measure is pointless anyway. That measure only
> matters if its small and the sum is 'small'. After that its back to the
> normal load based thingy.
Yes, agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists