[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190731093525.GH31425@hirez.programming.kicks-ass.net>
Date: Wed, 31 Jul 2019 11:35:25 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Rik van Riel <riel@...riel.com>
Cc: linux-kernel@...r.kernel.org, kernel-team@...com, pjt@...gle.com,
dietmar.eggemann@....com, mingo@...hat.com,
morten.rasmussen@....com, tglx@...utronix.de,
mgorman@...hsingularity.net, vincent.guittot@...aro.org
Subject: Re: [PATCH 09/14] sched,fair: refactor enqueue/dequeue_entity
On Tue, Jul 30, 2019 at 11:36:17AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 22, 2019 at 01:33:43PM -0400, Rik van Riel wrote:
> > +static bool
> > +enqueue_entity_groups(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> > +{
> > + /*
> > + * When enqueuing a sched_entity, we must:
> > + * - Update loads to have both entity and cfs_rq synced with now.
> > + * - Add its load to cfs_rq->runnable_avg
> > + * - For group_entity, update its weight to reflect the new share of
> > + * its group cfs_rq
> > + * - Add its new weight to cfs_rq->load.weight
> > + */
> > + if (!update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH))
> > + return false;
> > +
> > + update_cfs_group(se);
> > + return true;
> > +}
> No functional, but you did make update_cfs_group() conditional. Now that
> looks OK, but maybe you can do that part in a separate patch with a
> little justification of its own.
To record (and extend) our discussion from IRC yesterday; I now do think
the above is in fact a problem.
The thing is that update_cfs_group() does not soly rely on the tg state;
it also contains magic to deal with ramp up; for which you later
introduce that max_h_load thing.
Specifically (re)read the second part of the comment describing
calc_group_shares() where it explains the ramp up:
* The problem with it is that because the average is slow -- it was designed
* to be exactly that of course -- this leads to transients in boundary
* conditions. In specific, the case where the group was idle and we start the
* one task. It takes time for our CPU's grq->avg.load_avg to build up,
* yielding bad latency etc..
(and further)
So by not always calling this (and not updating h_load) you fail to take
advantage of this.
So I would suggest keeping update_cfs_group() unconditional, and
recomputing the h_load for the entire hierarchy on enqueue.
Powered by blists - more mailing lists