linux-kernel - Re: [PATCH] sched: Update task group load contributions during active load-balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xm26k352qfz7.fsf@sword-of-the-dawn.mtv.corp.google.com>
Date:	Wed, 17 Sep 2014 11:06:20 -0700
From:	bsegall@...gle.com
To:	Morten Rasmussen <morten.rasmussen@....com>
Cc:	"peterz\@infradead.org" <peterz@...radead.org>,
	"mingo\@kernel.org" <mingo@...nel.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>
Subject: Re: [PATCH] sched: Update task group load contributions during active load-balancing

Morten Rasmussen <morten.rasmussen@....com> writes:

> On Tue, Sep 16, 2014 at 06:49:14PM +0100, bsegall@...gle.com wrote:
>> Morten Rasmussen <morten.rasmussen@....com> writes:
>> 
>> > Task group load-contributions are not updated when tasks belonging to
>> > task groups are migrated by active load-balancing. If no other task
>> > belonging to the same task group is already queued at the destination
>> > cpu the group sched_entity will be enqueued with load_avg_contrib=0.
>> > Hence, weighted_cpuload() won't reflect the newly added load.
>> >
>> > The load may remain invisible until the next tick, when the sched_entity
>> > load_avg_contrib and task group contributions are reevaluated.
>> >
>> > The enqueue loop
>> >
>> > 	for_each_entity(se) {
>> > 	   enqueue_entity(cfs_rq,se)
>> > 	      ...
>> > 	      enqueue_entity_load_avg(cfs_rq,se)
>> > 	         ...
>> > 	         update_entity_load_avg(se)
>> > 		    ...
>> > 		    __update_entity_load_avg_contrib(se)
>> > 		    ...
>> > 		 ...
>> > 	         update_cfs_rq_blocked_load(cfs_rq)
>> > 		    ...
>> > 		    __update_cfs_rq_tg_load_contrib(cfs_rq)
>> > 		    ...
>> > 	}
>> >
>> > currently skips __update_entity_load_avg_contrib() and
>> > __update_cfs_rq_tg_load_contrib() for group entities for active
>> > load-balance migrations. The former updates the sched_entity
>> > load_avg_contrib, and the latter updates the task group contribution
>> > which is needed by the former. They must both be called to ensure that
>> > load doesn't temporarily disappear.
>> >
>> > cc: Paul Turner <pjt@...gle.com>
>> > cc: Ben Segall <bsegall@...gle.com>
>> >
>> > Signed-off-by: Morten Rasmussen <morten.rasmussen@....com>
>> > ---
>> >  kernel/sched/fair.c |    7 ++++++-
>> >  1 file changed, 6 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index be9e97b..2b6e2eb 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -2521,7 +2521,8 @@ static inline void update_entity_load_avg(struct sched_entity *se,
>> >  	else
>> >  		now = cfs_rq_clock_task(group_cfs_rq(se));
>> >  
>> > -	if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq))
>> > +	if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq) &&
>> > +							entity_is_task(se))
>> >  		return;
>> >  
>> >  	contrib_delta = __update_entity_load_avg_contrib(se);
>> > @@ -2609,6 +2610,10 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
>> >  	cfs_rq->runnable_load_avg += se->avg.load_avg_contrib;
>> >  	/* we force update consideration on load-balancer moves */
>> >  	update_cfs_rq_blocked_load(cfs_rq, !wakeup);
>> > +
>> > +	/* We force update group contributions on load-balancer moves */
>> > +	if (wakeup && !entity_is_task(se))
>> > +		__update_cfs_rq_tg_load_contrib(cfs_rq, 0);
>> >  }
>> >  
>> >  /*
>> 
>> It should probably be clearer that what this actually means is that we
>> always update tg_load_contrib for any group-of-group (which is weird),
>
> Would it not only update groups entities if they aren't already on a
> runqueue? I'm clearly missing something.

No, that's correct, but it will do it on normal sleep/wake too, not just
load-balancer moves.

>
>> so that we update the entire tree on load-balancer moves (which is
>> sensible and what we care about, because it determines each se.load_avg_contrib).
>> 
>> Wouldn't we need this on dequeue as well?
>
> Yes. The problem is there too I think. It is not as visible though. If
> the last task of a particular group is migrated away from a cpu, the
> group entities will be dequeued and we should be fine. However, if there
> are other tasks left in the group or any parent group the load of that
> group will be off, I think. I need to verify that.

Yeah, the case where it comes up would be different, and possibly less
bad.

>
>> 
>> I believe the reason for both of these ratelimits was performance on
>> sleep/wake (where blocked_load_avg means tg_load_contrib should not have
>> changed), it would be good to know how much this hurts. Given the
>> fast-exit in __update_cfs_rq_tg_load_contrib, it seems like it might be
>> fine, but the group-entity __update_entity_load_avg_contrib is less
>> free when div_u64 is expensive, and is currently only called 1/ms.
>
> Right. Is there a suitable group scheduling benchmark I can use to test
> this?

The most severe would probably be pipe test where you moved the threads
to be in separate cgroups. A more realistic one would probably just be
looping around nanosleep(100ns) on a bunch of cpus or something like that.

>
>> 
>> It seems like it would be a bit of a mess to force an update of the
>> entire path in migrate, but it would certainly dodge the performance
>> concerns if they wind up being an issue. (I suppose we could have
>> ENQUEUE/DEQUEUE_MIGRATE or something...)
>
> Yes, I had the same thought. We might have to do something like that
> anyway to fix the dequeue path. It seems less straightforward to fix,
> but I need to take a closer look.
>
> Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/