linux-kernel - Re: [PATCH v8 2/4] sched: Rewrite runnable load and utilization average tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150619060038.GA1240@fixme-laptop.cn.ibm.com>
Date:	Fri, 19 Jun 2015 14:00:38 +0800
From:	Boqun Feng <boqun.feng@...il.com>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	mingo@...nel.org, peterz@...radead.org,
	linux-kernel@...r.kernel.org, pjt@...gle.com, bsegall@...gle.com,
	morten.rasmussen@....com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, arjan.van.de.ven@...el.com,
	len.brown@...el.com, rafael.j.wysocki@...el.com,
	fengguang.wu@...el.com, srikar@...ux.vnet.ibm.com
Subject: Re: [PATCH v8 2/4] sched: Rewrite runnable load and utilization
 average tracking

Hi Yuyang,

On Tue, Jun 16, 2015 at 03:26:05AM +0800, Yuyang Du wrote:
> @@ -5977,36 +5786,6 @@ static void attach_tasks(struct lb_env *env)
>  }
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -/*
> - * update tg->load_weight by folding this cpu's load_avg
> - */
> -static void __update_blocked_averages_cpu(struct task_group *tg, int cpu)
> -{
> -	struct sched_entity *se = tg->se[cpu];
> -	struct cfs_rq *cfs_rq = tg->cfs_rq[cpu];
> -
> -	/* throttled entities do not contribute to load */
> -	if (throttled_hierarchy(cfs_rq))
> -		return;
> -
> -	update_cfs_rq_blocked_load(cfs_rq, 1);
> -
> -	if (se) {
> -		update_entity_load_avg(se, 1);
> -		/*
> -		 * We pivot on our runnable average having decayed to zero for
> -		 * list removal.  This generally implies that all our children
> -		 * have also been removed (modulo rounding error or bandwidth
> -		 * control); however, such cases are rare and we can fix these
> -		 * at enqueue.
> -		 *
> -		 * TODO: fix up out-of-order children on enqueue.
> -		 */
> -		if (!se->avg.runnable_avg_sum && !cfs_rq->nr_running)
> -			list_del_leaf_cfs_rq(cfs_rq);
> -	}
> -}
> -
>  static void update_blocked_averages(int cpu)
>  {
>  	struct rq *rq = cpu_rq(cpu);
> @@ -6015,17 +5794,17 @@ static void update_blocked_averages(int cpu)
>  
>  	raw_spin_lock_irqsave(&rq->lock, flags);
>  	update_rq_clock(rq);
> +
>  	/*
>  	 * Iterates the task_group tree in a bottom up fashion, see
>  	 * list_add_leaf_cfs_rq() for details.
>  	 */
>  	for_each_leaf_cfs_rq(rq, cfs_rq) {
> -		/*
> -		 * Note: We may want to consider periodically releasing
> -		 * rq->lock about these updates so that creating many task
> -		 * groups does not result in continually extending hold time.
> -		 */
> -		__update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
> +		/* throttled entities do not contribute to load */
> +		if (throttled_hierarchy(cfs_rq))
> +			continue;
> +
> +		update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);

We iterates task_group tree(actually the corresponding cfs_rq tree on
that cpu), because we want to do a bottom-up update. And we want a
bottom-up update, because we want to update the weigthed_cpuload().

__update_blocked_averages_cpu(tg, cpu) does three things:

Let's say:
cfs_rq = tg->cfs_rq[cpu]
se = tg->cfs_rq[cpu]
pcfs_rq = cfs_rq_of(se)  ,which is the parent of cfs_rq.

1. update cfs_rq->blocked_load_avg, and its contrib to its task group.
2. update se->avg and calculate the deltas of se->avg.*_avg_contrib.
3. update pcfs_rq->*_load_avg with the deltas in step 2

In this way, __update_blocked_averages_cpu(tg, cpu) can contributes
tg's load changes to its parent. So that update_blocked_averages() can
aggregate all load changes to weighted_cpuload().

However, update_cfs_rq_load_avg() only updates cfs_rq->avg, the change
won't be contributed or aggregated to cfs_rq's parent in the
for_each_leaf_cfs_rq loop, therefore that's actually not a bottom-up
update.

To fix this, I think we can add a update_cfs_shares(cfs_rq) after
update_cfs_rq_load_avg(). Like:

 	for_each_leaf_cfs_rq(rq, cfs_rq) {
-		/*
-		 * Note: We may want to consider periodically releasing
-		 * rq->lock about these updates so that creating many task
-		 * groups does not result in continually extending hold time.
-		 */
-		__update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
+		/* throttled entities do not contribute to load */
+		if (throttled_hierarchy(cfs_rq))
+			continue;
+
+		update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+		update_cfs_share(cfs_rq);
 	}

However, I think update_cfs_share isn't cheap, because it may do a
bottom-up update once called. So how about just update the root cfs_rq?
Like:

-	/*
-	 * Iterates the task_group tree in a bottom up fashion, see
-	 * list_add_leaf_cfs_rq() for details.
-	 */
-	for_each_leaf_cfs_rq(rq, cfs_rq) {
-		/*
-		 * Note: We may want to consider periodically releasing
-		 * rq->lock about these updates so that creating many task
-		 * groups does not result in continually extending hold time.
-		 */
-		__update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
-	}
+	update_cfs_rq_load_avg(rq_clock_task(rq), rq->cfs_rq);

Thanks and Best Regards,
Boqun

Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)