linux-kernel - Re: [PATCH 09/14] sched,fair: refactor enqueue/dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190731093525.GH31425@hirez.programming.kicks-ass.net>
Date:   Wed, 31 Jul 2019 11:35:25 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Rik van Riel <riel@...riel.com>
Cc:     linux-kernel@...r.kernel.org, kernel-team@...com, pjt@...gle.com,
        dietmar.eggemann@....com, mingo@...hat.com,
        morten.rasmussen@....com, tglx@...utronix.de,
        mgorman@...hsingularity.net, vincent.guittot@...aro.org
Subject: Re: [PATCH 09/14] sched,fair: refactor enqueue/dequeue_entity

On Tue, Jul 30, 2019 at 11:36:17AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 22, 2019 at 01:33:43PM -0400, Rik van Riel wrote:

> > +static bool
> > +enqueue_entity_groups(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> > +{
> > +	/*
> > +	 * When enqueuing a sched_entity, we must:
> > +	 *   - Update loads to have both entity and cfs_rq synced with now.
> > +	 *   - Add its load to cfs_rq->runnable_avg
> > +	 *   - For group_entity, update its weight to reflect the new share of
> > +	 *     its group cfs_rq
> > +	 *   - Add its new weight to cfs_rq->load.weight
> > +	 */
> > +	if (!update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH))
> > +		return false;
> > +
> > +	update_cfs_group(se);
> > +	return true;
> > +}

> No functional, but you did make update_cfs_group() conditional. Now that
> looks OK, but maybe you can do that part in a separate patch with a
> little justification of its own.

To record (and extend) our discussion from IRC yesterday; I now do think
the above is in fact a problem.

The thing is that update_cfs_group() does not soly rely on the tg state;
it also contains magic to deal with ramp up; for which you later
introduce that max_h_load thing.

Specifically (re)read the second part of the comment describing
calc_group_shares() where it explains the ramp up:

 * The problem with it is that because the average is slow -- it was designed
 * to be exactly that of course -- this leads to transients in boundary
 * conditions. In specific, the case where the group was idle and we start the
 * one task. It takes time for our CPU's grq->avg.load_avg to build up,
 * yielding bad latency etc..

 (and further)

So by not always calling this (and not updating h_load) you fail to take
advantage of this.

So I would suggest keeping update_cfs_group() unconditional, and
recomputing the h_load for the entire hierarchy on enqueue.