linux-kernel - Re: [PATCH] sched: prevent sched entity from being decayed twice when both waking and migrating it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Sun, 19 Jul 2015 15:03:03 +0900
From:	Byungchul Park <byungchul.park@....com>
To:	bsegall@...gle.com
Cc:	pjt@...gle.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched: prevent sched entity from being decayed twice
 when both waking and migrating it

On Fri, Jul 17, 2015 at 10:02:22AM -0700, bsegall@...gle.com wrote:
> Byungchul Park <byungchul.park@....com> writes:
> 
> > On Thu, Jul 16, 2015 at 10:00:00AM -0700, bsegall@...gle.com wrote:
> >
> > hello,
> >
> >> byungchul.park@....com writes:
> >> 
> >> > From: Byungchul Park <byungchul.park@....com>
> >> >
> >> > hello paul,
> >> >
> >> > can i ask you something?
> >> >
> >> > when a sched entity is both waken and migrated, it looks being decayed twice.
> >> > did you do it on purpose?
> >> > or am i missing something? :(
> >> >
> >> > thanks,
> >> > byungchul
> >> 
> >> __synchronize_entity_decay() updates only se->avg.load_avg_contrib so
> >> that removing from blocked_load is done correctly.
> >
> > as you said, it should done here. :)
> >
> >> update_entity_load_avg() accounts that (approximation of) time blocked
> >
> > i mean the entity was already accounted the blocked time in
> > __synchronize_entity_decay().
> >
> >> against runnable_avg/running_avg (and then recomputes load_avg_contrib
> >> to match while load_avg_contrib isn't part of any cfs_rq's sum).
> >
> > the thing to keep in mind is that, currently load tracking is done by 
> > per-entity. that is, the entity already has its own whole load_avg_contrib
> > with considering the entity's blocked time, after __synchronize_entity_decay().
> > and cfs_rq can account the se's load by adding se->avg.load_avg_contrib to 
> > cfs_rq->runnable_load_avg, like enqueue_entity_load_avg() code.
> >
> > wrong?
> 
> load_avg_contrib is computed from runnable_avg, which is not updated by
> __synchronize_entity_decay, only by update_entity_load_avg ->
> __update_entity_runnable_avg. __synchronize_entity_decay is used in this path
> because update_entity_load_avg needs the rq lock (along with some other
> reasons), and migrate_task_rq_fair generally doesn't have the lock.

hello ben,

i see...
i missed the fact that __synchronize_entity_decay() is only for blocked one.
i am sorry for bothering you. ;(

thank you very much,
byungchul

> 
> >
> > thanks,
> > byungchul
> >
> >> 
> >> >
> >> > --------------->8---------------
> >> > From 793c963d0b29977a0f6f9330291a9ea469cc54f0 Mon Sep 17 00:00:00 2001
> >> > From: Byungchul Park <byungchul.park@....com>
> >> > Date: Thu, 16 Jul 2015 16:49:48 +0900
> >> > Subject: [PATCH] sched: prevent sched entity from being decayed twice when
> >> >  both waking and migrating it
> >> >
> >> > current code is decaying load average variables with a sleep time twice,
> >> > when both waking and migrating it. the first decaying happens in a call path
> >> > "migrate_task_rq_fair() -> __synchronize_entity_decay()". the second
> >> > decaying happens in a call path "enqueue_entity_load_avg() ->
> >> > update_entity_load_avg()". so make it happen once.
> >> >
> >> > Signed-off-by: Byungchul Park <byungchul.park@....com>
> >> > ---
> >> >  kernel/sched/fair.c |   29 +++--------------------------
> >> >  1 file changed, 3 insertions(+), 26 deletions(-)
> >> >
> >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> > index 09456fc..c86cca0 100644
> >> > --- a/kernel/sched/fair.c
> >> > +++ b/kernel/sched/fair.c
> >> > @@ -2873,32 +2873,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
> >> >  						  struct sched_entity *se,
> >> >  						  int wakeup)
> >> >  {
> >> > -	/*
> >> > -	 * We track migrations using entity decay_count <= 0, on a wake-up
> >> > -	 * migration we use a negative decay count to track the remote decays
> >> > -	 * accumulated while sleeping.
> >> > -	 *
> >> > -	 * Newly forked tasks are enqueued with se->avg.decay_count == 0, they
> >> > -	 * are seen by enqueue_entity_load_avg() as a migration with an already
> >> > -	 * constructed load_avg_contrib.
> >> > -	 */
> >> > -	if (unlikely(se->avg.decay_count <= 0)) {
> >> > +	/* we track migrations using entity decay_count == 0 */
> >> > +	if (unlikely(!se->avg.decay_count)) {
> >> >  		se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
> >> > -		if (se->avg.decay_count) {
> >> > -			/*
> >> > -			 * In a wake-up migration we have to approximate the
> >> > -			 * time sleeping.  This is because we can't synchronize
> >> > -			 * clock_task between the two cpus, and it is not
> >> > -			 * guaranteed to be read-safe.  Instead, we can
> >> > -			 * approximate this using our carried decays, which are
> >> > -			 * explicitly atomically readable.
> >> > -			 */
> >> > -			se->avg.last_runnable_update -= (-se->avg.decay_count)
> >> > -							<< 20;
> >> > -			update_entity_load_avg(se, 0);
> >> > -			/* Indicate that we're now synchronized and on-rq */
> >> > -			se->avg.decay_count = 0;
> >> > -		}
> >> >  		wakeup = 0;
> >> >  	} else {
> >> >  		__synchronize_entity_decay(se);
> >> > @@ -5114,7 +5091,7 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
> >> >  	 * be negative here since on-rq tasks have decay-count == 0.
> >> >  	 */
> >> >  	if (se->avg.decay_count) {
> >> > -		se->avg.decay_count = -__synchronize_entity_decay(se);
> >> > +		__synchronize_entity_decay(se);
> >> >  		atomic_long_add(se->avg.load_avg_contrib,
> >> >  						&cfs_rq->removed_load);
> >> >  	}
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@...r.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/