linux-kernel - Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151214144645.GA23930@e105550-lin.cambridge.arm.com>
Date:	Mon, 14 Dec 2015 14:46:46 +0000
From:	Morten Rasmussen <morten.rasmussen@....com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Yuyang Du <yuyang.du@...el.com>,
	Andrey Ryabinin <aryabinin@...tuozzo.com>, mingo@...hat.com,
	linux-kernel@...r.kernel.org, Paul Turner <pjt@...gle.com>,
	Ben Segall <bsegall@...gle.com>
Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems

On Mon, Dec 14, 2015 at 03:20:21PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 14, 2015 at 01:07:26PM +0000, Morten Rasmussen wrote:
> 
> > Agreed, >100% is a transient state (which can be rather long) which only
> > means over-utilized, nothing more. Would you like the metric itself to
> > be changed to saturate at 100% or just cap it to 100% when used?
> 
> We already cap it when using it IIRC. But no, I was thinking of the
> measure itself.

Yes, okay.

> 
> > It is not straight forward to provide a bound on the sum.
> 
> Agreed..
> 
> > There isn't one for load_avg either.
> 
> But that one is fundamentally unbound, whereas the util thing is
> fundamentally bound, except our implementation isn't.

Agreed.

> 
> > If we want to guarantee an upper bound for
> > cfs_rq->avg.util_sum we have to somehow cap the se->avg.util_avg
> > contributions for each sched_entity. This cap depends on the cpu and how
> > many other tasks are associated with that cpu. The cap may have to
> > change when tasks migrate.
> 
> Yep, blows :-)
> 
> > > However, I think that makes sense, but would propose doing it
> > > differently. That condition is generally a maximum (assuming proper
> > > functioning of the weight based scheduling etc..) for any one task, so
> > > on migrate we can hard clip to this value.
> 
> > Why use load.weight to scale util_avg? It is affected by priority. Isn't
> > just the ratio 1/nr_running that you are after?
> 
> Remember, the util thing is based on running, so assuming each task
> always wants to run, each task gets to run w_i/\Sum_j w_j due to CFS
> being a weighted fair queueing thingy.

Of course, yes.

> 
> > IIUC, you propose to clip the sum itself. In which case you are running
> > into trouble when removing tasks. You don't know how much to remove from
> > the clipped sum.
> 
> Right, then we'll have to slowly gain it again.

If you have a seriously over-utilized cpu and migrate some of the tasks
to a different cpu the old cpu may temporarily look lightly utilized
even if we leave some big tasks behind. That might lead us to trouble if
we start using util_avg as the basis for cpufreq decisions. If we care
about performance, the safe choice is to consider an cpu over-utilized
still over-utilized even after we have migrated tasks away. We can only
trust that the cpu is no longer over-utilized when cfs_rq->avg.util_avg
'naturally' goes below 100%. So from that point of view, it might be
better to let it stay 100% and let it sort itself out.

> > Another problem is that load.weight is just a snapshot while
> > avg.util_avg includes tasks that are not currently on the rq so the
> > scaling factor is probably bigger than what you want.
> 
> Our weight guestimates also include non running (aka blocked) tasks,
> right?

The rq/cfs_rq load.weight doesn't. It is updated through
update_load_{add,sub}() in account_entity_{enqueue,dequeue}(). So only
runnable+running tasks I think.

> > If we leave the sum as it is (unclipped) add/remove shouldn't give us
> > any problems. The only problem is the overflow, which is solved by using
> > a 64bit type for load_avg. That is not an acceptable solution?
> 
> It might be. After all, any time any of this is needed we're CPU bound
> and the utilization measure is pointless anyway. That measure only
> matters if its small and the sum is 'small'. After that its back to the
> normal load based thingy.

Yes, agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/