linux-kernel - Re: [RFC 3/4] sched/fair: replace runnable load average by runnable average

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200212143023.GV3420@suse.de>
Date:   Wed, 12 Feb 2020 14:30:23 +0000
From:   Mel Gorman <mgorman@...e.de>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        linux-kernel@...r.kernel.org, pauld@...hat.com,
        parth@...ux.ibm.com, valentin.schneider@....com
Subject: Re: [RFC 3/4] sched/fair: replace runnable load average by runnable
 average

On Tue, Feb 11, 2020 at 06:46:50PM +0100, Vincent Guittot wrote:
> Now that runnable_load_avg is not more used, we can replace it by a new
> signal that will highlight the runnable pressure on a cfs_rq. This signal
> track the waiting time of tasks on rq and can help to better define the
> state of rqs.
> 
> At now, only util_avg is used to define the state of a rq:
>   A rq with more that around 80% of utilization and more than 1 tasks is
>   considered as overloaded.
> 
> But the util_avg signal of a rq can become temporaly low after that a task
> migrated onto another rq which can bias the classification of the rq.
> 
> When tasks compete for the same rq, their runnable average signal will be
> higher than util_avg as it will include the waiting time and we can use
> this signal to better classify cfs_rqs.
> 
> The new runnable_avg will track the runnable time of a task which simply
> adds the waiting time to the running time. The runnbale _avg of cfs_rq
> will be the /Sum of se's runnable_avg and the runnable_avg of group entity
> will follow the one of the rq similarly to util_avg.
> 

s/runnbale/runnable/

Otherwise, all I can do is give a heads-up that I will not be able to
review this patch and the next patch properly in the short-term. While the
new metric appears to have a sensible definition, I've not spent enough
time comparing/contrasting the pro's and con's of PELT implementation
details or their consequences. I am not confident I can accurately
predict whether this is better or if there are corner cases that make
poor placement decisions based on fast changes of runnable_avg. At least
not within a reasonable amount of time.

This caught my attention though

> @@ -4065,8 +4018,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	 *   - Add its new weight to cfs_rq->load.weight
>  	 */
>  	update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH);
> +	se_update_runnable(se);
>  	update_cfs_group(se);
> -	enqueue_runnable_load_avg(cfs_rq, se);
>  	account_entity_enqueue(cfs_rq, se);
>  
>  	if (flags & ENQUEUE_WAKEUP)

I don't think the ordering matters any more because of what was removed
from update_cfs_group. Unfortunately, I'm not 100% confident so am
bringing it to your attention in case it does.

-- 
Mel Gorman
SUSE Labs