linux-kernel - Re: [PATCH v2 08/11] sched: get CPU's activity statistic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140603171628.GE29593@e103034-lin>
Date:	Tue, 3 Jun 2014 18:16:28 +0100
From:	Morten Rasmussen <morten.rasmussen@....com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Vincent Guittot <vincent.guittot@...aro.org>,
	"mingo@...nel.org" <mingo@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux@....linux.org.uk" <linux@....linux.org.uk>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"preeti@...ux.vnet.ibm.com" <preeti@...ux.vnet.ibm.com>,
	"efault@....de" <efault@....de>,
	"nicolas.pitre@...aro.org" <nicolas.pitre@...aro.org>,
	"linaro-kernel@...ts.linaro.org" <linaro-kernel@...ts.linaro.org>,
	"daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
	Paul Turner <pjt@...gle.com>,
	Benjamin Segall <bsegall@...gle.com>
Subject: Re: [PATCH v2 08/11] sched: get CPU's activity statistic

On Tue, Jun 03, 2014 at 04:40:58PM +0100, Peter Zijlstra wrote:
> On Wed, May 28, 2014 at 01:10:01PM +0100, Morten Rasmussen wrote:
> > The rq runnable_avg_{sum, period} give a very long term view of the cpu
> > utilization (I will use the term utilization instead of activity as I
> > think that is what we are talking about here). IMHO, it is too slow to
> > be used as basis for load balancing decisions. I think that was also
> > agreed upon in the last discussion related to this topic [1].
> > 
> > The basic problem is that worst case: sum starting from 0 and period
> > already at LOAD_AVG_MAX = 47742, it takes LOAD_AVG_MAX_N = 345 periods
> > (ms) for sum to reach 47742. In other words, the cpu might have been
> > fully utilized for 345 ms before it is considered fully utilized.
> > Periodic load-balancing happens much more frequently than that.
> 
> Like said earlier the 94% mark is actually hit much sooner, but yes,
> likely still too slow.
> 
> 50% at 32 ms, 75% at 64 ms, 87.5% at 96 ms, etc..

Agreed.

> 
> > Also, if load-balancing actually moves tasks around it may take quite a
> > while before runnable_avg_sum actually reflects this change. The next
> > periodic load-balance is likely to happen before runnable_avg_sum has
> > reflected the result of the previous periodic load-balance.
> > 
> > To avoid these problems, we need to base utilization on a metric which
> > is updated instantaneously when we add/remove tasks to a cpu (or a least
> > fast enough that we don't see the above problems).
> 
> So the per-task-load-tracking stuff already does that. It updates the
> per-cpu load metrics on migration. See {de,en}queue_entity_load_avg().

I think there is some confusion here. There are two per-cpu load metrics
that tracks differently.

The cfs.runnable_load_avg is basically the sum of the load contributions
of the tasks on the cfs rq. The sum gets updated whenever tasks are
{en,de}queued by adding/subtracting the load contribution of the task
being added/removed. That is the one you are referring to.

The rq runnable_avg_sum (actually rq->avg.runnable_avg_{sum, period}) is
tracking whether the cpu has something to do or not. It doesn't matter
many tasks are runnable or what their load is. It is updated in
update_rq_runnable_avg(). It increases when rq->nr_running > 0 and
decays if not. It also takes time spent running rt tasks into account in
idle_{enter, exit}_fair(). So if you remove tasks from the rq, this
metric will start decaying and eventually get to 0, unlike the
cfs.runnable_load_avg where the task load contribution subtracted every
time a task is removed. The rq runnable_avg_sum is the one being used in
this patch set.

Ben, pjt, please correct me if I'm wrong.

> And keeping an unweighted per-cpu variant isn't that much more work.

Agreed.

> 
> > In the previous
> > discussion [1] it was suggested that a sum of unweighted task
> > runnable_avg_{sum,period} ratio instead. That is, an unweighted
> > equivalent to weighted_cpuload(). That isn't a perfect solution either.
> > It is fine as long as the cpus are not fully utilized, but when they are
> > we need to use weighted_cpuload() to preserve smp_nice. What to do
> > around the tipping point needs more thought, but I think that is
> > currently the best proposal for a solution for task and cpu utilization.
> 
> I'm not too worried about the tipping point, per task runnable figures
> of an overloaded cpu are higher, so migration between an overloaded cpu
> and an underloaded cpu are going to be tricky no matter what we do.

Yes, agreed. I just got the impression that you were concerned about
smp_nice last time we discussed this.

> > rq runnable_avg_sum is useful for decisions where we need a longer term
> > view of the cpu utilization, but I don't see how we can use as cpu
> > utilization metric for load-balancing decisions at wakeup or
> > periodically.
> 
> So keeping one with a faster decay would add extra per-task storage. But
> would be possible..

I have had that thought when we discussed potential replacements for
cpu_load[]. It will require some messing around with the nicely
optimized load tracking maths if we want to have load tracking with a
different y-coefficient.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/