linux-kernel - Re: [PATCH 2/2] sched: Rewrite per entity runnable load average tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 7 Jul 2014 12:46:46 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	mingo@...hat.com, linux-kernel@...r.kernel.org,
	rafael.j.wysocki@...el.com, arjan.van.de.ven@...el.com,
	len.brown@...el.com, alan.cox@...el.com, mark.gross@...el.com,
	pjt@...gle.com, fengguang.wu@...el.com,
	Ben Segall <bsegall@...gle.com>
Subject: Re: [PATCH 2/2] sched: Rewrite per entity runnable load average
 tracking

On Wed, Jul 02, 2014 at 10:30:56AM +0800, Yuyang Du wrote:
> The idea of per entity runnable load average (aggregated to cfs_rq and task_group load)
> was proposed by Paul Turner, and it is still followed by this rewrite. But this rewrite
> is made due to the following ends:
> 
> (1). cfs_rq's load average (namely runnable_load_avg and blocked_load_avg) is updated
> incrementally by one entity at one time, which means the cfs_rq load average is only
> partially updated or asynchronous accross its entities (the entity in question is up
> to date and contributes to the cfs_rq, but all other entities are effectively lagging
> behind).
> 
> (2). cfs_rq load average is different between top rq->cfs_rq and task_group's per CPU
> cfs_rqs in whether or not blocked_load_average contributes to the load.

ISTR there was a reason for it; can't remember though, maybe pjt/ben can
remember.

> (3). How task_group's load is tracked is very confusing and complex.
> 
> Therefore, this rewrite tackles these by:
> 
> (1). Combine runnable and blocked load averages for cfs_rq. And track cfs_rq's load average
> as a whole (contributed by all runnabled and blocked entities on this cfs_rq).
> 
> (2). Only track task load average. Do not track task_group's per CPU entity average, but
> track that entity's own cfs_rq's aggregated average.
> 
> This rewrite resutls in significantly reduced codes and expected consistency and clarity.
> Also, if draw the lines of previous cfs_rq runnable_load_avg and blocked_load_avg and the
> new rewritten load_avg, then compare those lines, you can see the new load_avg is much
> more continuous (no abrupt jumping ups and downs) and decayed/updated more quickly and
> synchronously.

OK, maybe seeing what you're doing. I worry about a fwe things though:

> +static inline void synchronize_tg_load_avg(struct cfs_rq *cfs_rq, u32 old)
>  {
> +       s32 delta = cfs_rq->avg.load_avg - old;
>  
> +       if (delta)
> +               atomic_long_add(delta, &cfs_rq->tg->load_avg);
>  }

That tg->load_avg cacheline is already red hot glowing, and you've just
increased the amount of updates to it.. That's not going to be pleasant.


> +static inline void enqueue_entity_load_avg(struct sched_entity *se)
>  {
> +	struct sched_avg *sa = &se->avg;
> +	struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +	u64 now = cfs_rq_clock_task(cfs_rq);
> +	u32 old_load_avg = cfs_rq->avg.load_avg;
> +	int migrated = 0;
>  
> +	if (entity_is_task(se)) {
> +		if (sa->last_update_time == 0) {
> +			sa->last_update_time = now;
> +			migrated = 1;
>  		}
> +		else
> +			__update_load_avg(now, sa, se->on_rq * se->load.weight);
>  	}
>  
> +	__update_load_avg(now, &cfs_rq->avg, cfs_rq->load.weight);
>  
> +	if (migrated)
> +		cfs_rq->avg.load_avg += sa->load_avg;
>  
> +	synchronize_tg_load_avg(cfs_rq, old_load_avg);
>  }

So here you add the task to the cfs_rq avg when its got migrate in,
however:

> @@ -4552,17 +4326,9 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
>  	struct sched_entity *se = &p->se;
>  	struct cfs_rq *cfs_rq = cfs_rq_of(se);
>  
> +	/* Update task on old CPU, then ready to go (entity must be off the queue) */
> +	__update_load_avg(cfs_rq_clock_task(cfs_rq), &se->avg, 0);
> +	se->avg.last_update_time = 0;
>  
>  	/* We have migrated, no longer consider this task hot */
>  	se->exec_start = 0;

there you don't remove it first..


Content of type "application/pgp-signature" skipped