linux-kernel - Re: [RESEND PATCH 2/3 v5] sched: Rewrite per entity runnable load average tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141022100411.GC23531@worktop.programming.kicks-ass.net>
Date:	Wed, 22 Oct 2014 12:04:11 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	mingo@...hat.com, linux-kernel@...r.kernel.org, pjt@...gle.com,
	bsegall@...gle.com, arjan.van.de.ven@...el.com,
	len.brown@...el.com, rafael.j.wysocki@...el.com,
	alan.cox@...el.com, mark.gross@...el.com, fengguang.wu@...el.com
Subject: Re: [RESEND PATCH 2/3 v5] sched: Rewrite per entity runnable load
 average tracking

On Fri, Oct 10, 2014 at 10:21:56AM +0800, Yuyang Du wrote:
> +/* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
> +static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>  {
> +	int decayed;
>  
> +	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
> +		long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
> +		cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - r, 0);
> +		cfs_rq->avg.load_sum =
> +			max_t(s64, cfs_rq->avg.load_sum - r * LOAD_AVG_MAX, 0);
>  	}
>  
> +	decayed = __update_load_avg(now, &cfs_rq->avg, cfs_rq->load.weight);
>  
> +#ifndef CONFIG_64BIT
> +	smp_wmb();
> +	cfs_rq->load_last_update_time_copy = cfs_rq->avg.last_update_time;
> +#endif
>  
> -static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
> +	return decayed;
> +}


> +void remove_entity_load_avg(struct sched_entity *se)
>  {
> +	struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +	u64 last_update_time;
> +
> +#ifndef CONFIG_64BIT
> +	u64 last_update_time_copy;
> +
> +	do {
> +		last_update_time_copy = cfs_rq->load_last_update_time_copy;
> +		smp_rmb();
> +		last_update_time = cfs_rq->avg.last_update_time;
> +	} while (last_update_time != last_update_time_copy);
> +#else
> +	last_update_time = cfs_rq->avg.last_update_time;
> +#endif
>  
> +	__update_load_avg(last_update_time, &se->avg, 0);
> +	atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
>  }


> +static void migrate_task_rq_fair(struct task_struct *p, int next_cpu)
>  {
>  	/*
> +	 * We are supposed to update the task to "current" time, then its up to date
> +	 * and ready to go to new CPU/cfs_rq. But we have difficulty in getting
> +	 * what current time is, so simply throw away the out-of-date time. This
> +	 * will result in the wakee task is less decayed, but giving the wakee more
> +	 * load sounds not bad.
>  	 */
> +	remove_entity_load_avg(&p->se);
> +
> +	/* Tell new CPU we are migrated */
> +	p->se.avg.last_update_time = 0;
>  
>  	/* We have migrated, no longer consider this task hot */
> +	p->se.exec_start = 0;
>  }


Because of:

  entity_tick()
    update_load_avg()
      update_cfs_rq_load_avg()

we're likely to only lag TICK_NSEC behind, right? And thus the
truncation we do in migrate_task_rq_fair() is of equal size.

Hmm,. one problem, cgroup cfs_rq can be idle for a long while and not
get get any ticks at all, so those can lag unbounded. Then again, this
appears to be a problem in the current code too, hmm..

Anybody?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/