linux-kernel - Re: [PATCH 2/2 v4] sched: Rewrite per entity runnable load average tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140728171909.GW19379@twins.programming.kicks-ass.net>
Date:	Mon, 28 Jul 2014 19:19:09 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	bsegall@...gle.com
Cc:	Yuyang Du <yuyang.du@...el.com>, mingo@...hat.com,
	linux-kernel@...r.kernel.org, pjt@...gle.com,
	arjan.van.de.ven@...el.com, len.brown@...el.com,
	rafael.j.wysocki@...el.com, alan.cox@...el.com,
	mark.gross@...el.com, fengguang.wu@...el.com
Subject: Re: [PATCH 2/2 v4] sched: Rewrite per entity runnable load average
 tracking

On Mon, Jul 28, 2014 at 09:58:19AM -0700, bsegall@...gle.com wrote:
> Peter Zijlstra <peterz@...radead.org> writes:
> 
> >> @@ -4551,18 +4382,34 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
> >>  {
> >>  	struct sched_entity *se = &p->se;
> >>  	struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> +	u64 last_update_time;
> >>  
> >>  	/*
> >> +	 * Task on old CPU catches up with its old cfs_rq, and subtract itself from
> >> +	 * the cfs_rq (task must be off the queue now).
> >>  	 */
> >> +#ifndef CONFIG_64BIT
> >> +	u64 last_update_time_copy;
> >> +
> >> +	do {
> >> +		last_update_time_copy = cfs_rq->load_last_update_time_copy;
> >> +		smp_rmb();
> >> +		last_update_time = cfs_rq->avg.last_update_time;
> >> +	} while (last_update_time != last_update_time_copy);
> >> +#else
> >> +	last_update_time = cfs_rq->avg.last_update_time;
> >> +#endif
> >> +	__update_load_avg(last_update_time, &se->avg, 0);
> >> +	atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
> >> +
> >> +	/*
> >> +	 * We are supposed to update the task to "current" time, then its up to date
> >> +	 * and ready to go to new CPU/cfs_rq. But we have difficulty in getting
> >> +	 * what current time is, so simply throw away the out-of-date time. This
> >> +	 * will result in the wakee task is less decayed, but giving the wakee more
> >> +	 * load sounds not bad.
> >> +	 */
> >> +	se->avg.last_update_time = 0;
> >>  
> >>  	/* We have migrated, no longer consider this task hot */
> >>  	se->exec_start = 0;
> >
> >
> > And here we try and make good on that assumption. The thing I worry
> > about is what happens if the machine is entirely idle...
> >
> > What guarantees an semi up-to-date cfs_rq->avg.last_update_time.
> 
> update_blocked_averages I think should do just as good a job as the old
> code, which isn't perfect but is about as good as you can get worst case.

Right, that's called from rebalance_domains() which should more or less
update this value on tick boundaries or thereabouts for most 'active'
cpus.

But if the entire machine is idle, the first wakeup (if its a x-cpu one)
might see a very stale timestamp.

If we can fix that, that would be good I suppose, but I'm not
immediately seeing something pretty there, but you're right, it'd not be
worse than the current situation.

Content of type "application/pgp-signature" skipped