linux-kernel - Re: [PATCH 09/14] sched,fair: refactor enqueue/dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <461f14cafabb7e6f78556f138b6aa619eff12dee.camel@surriel.com>
Date:   Wed, 31 Jul 2019 11:03:01 -0400
From:   Rik van Riel <riel@...riel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, kernel-team@...com, pjt@...gle.com,
        dietmar.eggemann@....com, mingo@...hat.com,
        morten.rasmussen@....com, tglx@...utronix.de,
        mgorman@...hsingularity.net, vincent.guittot@...aro.org
Subject: Re: [PATCH 09/14] sched,fair: refactor enqueue/dequeue_entity

On Wed, 2019-07-31 at 11:35 +0200, Peter Zijlstra wrote:
> On Tue, Jul 30, 2019 at 11:36:17AM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 22, 2019 at 01:33:43PM -0400, Rik van Riel wrote:
> > > +static bool
> > > +enqueue_entity_groups(struct cfs_rq *cfs_rq, struct sched_entity
> > > *se, int flags)
> > > +{
> > > +	/*
> > > +	 * When enqueuing a sched_entity, we must:
> > > +	 *   - Update loads to have both entity and cfs_rq synced with
> > > now.
> > > +	 *   - Add its load to cfs_rq->runnable_avg
> > > +	 *   - For group_entity, update its weight to reflect the new
> > > share of
> > > +	 *     its group cfs_rq
> > > +	 *   - Add its new weight to cfs_rq->load.weight
> > > +	 */
> > > +	if (!update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH))
> > > +		return false;
> > > +
> > > +	update_cfs_group(se);
> > > +	return true;
> > > +}
> > No functional, but you did make update_cfs_group() conditional. Now
> > that
> > looks OK, but maybe you can do that part in a separate patch with a
> > little justification of its own.
> 
> To record (and extend) our discussion from IRC yesterday; I now do
> think
> the above is in fact a problem.
> 
> The thing is that update_cfs_group() does not soly rely on the tg
> state;
> it also contains magic to deal with ramp up; for which you later
> introduce that max_h_load thing.
> 
> Specifically (re)read the second part of the comment describing
> calc_group_shares() where it explains the ramp up:
> 
>  * The problem with it is that because the average is slow -- it was
> designed
>  * to be exactly that of course -- this leads to transients in
> boundary
>  * conditions. In specific, the case where the group was idle and we
> start the
>  * one task. It takes time for our CPU's grq->avg.load_avg to build
> up,
>  * yielding bad latency etc..
> 
>  (and further)
> 
> So by not always calling this (and not updating h_load) you fail to
> take
> advantage of this.
> 
> So I would suggest keeping update_cfs_group() unconditional, and
> recomputing the h_load for the entire hierarchy on enqueue.

I think I understand the problem you are pointing
out, but if update_load_avg() keeps the load average
for the runqueue unchanged (because that is rate limited
to once a jiffy, and has been like that for a while),
why would calc_group_shares() result in a different value
than what it returned the last time?

What am I overlooking?

-- 
All Rights Reversed.

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)