linux-kernel - Re: [PATCH 4/5] sched/fair: Tune down misfit nohz kicks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190206160413.GK17550@hirez.programming.kicks-ass.net>
Date:   Wed, 6 Feb 2019 17:04:13 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org,
        vincent.guittot@...aro.org, morten.rasmussen@....com,
        Dietmar.Eggemann@....com
Subject: Re: [PATCH 4/5] sched/fair: Tune down misfit nohz kicks

On Thu, Jan 17, 2019 at 03:34:10PM +0000, Valentin Schneider wrote:
> In
> 
>   commmit 3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type")
> 
> we set rq->misfit_task_load whenever the current running task has a
> utilization greater than 80% of rq->cpu_capacity. A non-zero value in
> this field enables misfit load balancing.
> 
> However, if the task being looked at is already running on a CPU of
> highest capacity, there's nothing more we can do for it. We can
> currently spot this in update_sd_pick_busiest(), which prevents us
> from selecting a sched_group of group_type == group_misfit_task as the
> busiest group, but we don't do any of that in nohz_balancer_kick().
> 
> This means that we could repeatedly kick nohz CPUs when there's no
> improvements in terms of load balance to be done.
> 
> Introduce a check_misfit_status() helper that returns true iff there
> is a CPU in the system that could give more CPU capacity to a rq's
> misfit task - IOW, there exists a CPU of higher capacity_orig or the
> rq's CPU is severely pressured by rt/IRQ.
> 
> Signed-off-by: Valentin Schneider <valentin.schneider@....com>

> +static inline int check_misfit_status(struct rq *rq, struct sched_domain *sd)
> +{
> +	return rq->misfit_task_load &&
> +		(rq->cpu_capacity_orig < rq->rd->max_cpu_capacity ||
> +		 check_cpu_capacity(rq, sd));
> +}


> @@ -9527,7 +9539,7 @@ static void nohz_balancer_kick(struct rq *rq)
>  	if (time_before(now, nohz.next_balance))
>  		goto out;
>  
> -	if (rq->nr_running >= 2 || rq->misfit_task_load) {
> +	if (rq->nr_running >= 2) {
>  		flags = NOHZ_KICK_MASK;
>  		goto out;
>  	}
> @@ -9561,6 +9573,14 @@ static void nohz_balancer_kick(struct rq *rq)

	sd = rcu_dereference(rq->sd);
	if (sd) {
		if ((rq->cfs.h_nr_running >= 1) &&
		    check_cpu_capacity(rq, sd)) {
			flags = NOHZ_KICK_MASK;
			goto unlock;
>  		}
>  	}
>  
> +	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, cpu));
> +	if (sd) {
> +		if (check_misfit_status(rq, sd)) {
> +			flags = NOHZ_KICK_MASK;
> +			goto unlock;
> +		}
> +	}

So while the exact @sd to use for check_cpu_capacity() likely doesn't
matter; this is a 'implicit' test for actually having asym_capacity.

Fair enough I suppose. However, now that you wrote such a nice comment
for the sd_llc_shared case, these other two cases are sad to not have a
comment.

So how about you add something like:

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9589,8 +9589,12 @@ static void nohz_balancer_kick(struct rq
 
 	sd = rcu_dereference(rq->sd);
 	if (sd) {
-		if ((rq->cfs.h_nr_running >= 1) &&
-		    check_cpu_capacity(rq, sd)) {
+		/*
+		 * If there's a CFS task and the current CPU has reduced
+		 * capacity; kick the ILB to see if there's a better CPU to run
+		 * on.
+		 */
+		if (rq->cfs.h_nr_running >= 1 && check_cpu_capacity(rq, sd)) {
 			flags = NOHZ_KICK_MASK;
 			goto unlock;
 		}
@@ -9598,6 +9602,10 @@ static void nohz_balancer_kick(struct rq
 
 	sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, cpu));
 	if (sd) {
+		/*
+		 * When ASYM_CAPACITY; see if there's a higher capacity CPU to
+		 * run the misfit task on.
+		 */
 		if (check_misfit_status(rq, sd)) {
 			flags = NOHZ_KICK_MASK;
 			goto unlock;
@@ -9606,6 +9614,10 @@ static void nohz_balancer_kick(struct rq
 
 	sd = rcu_dereference(per_cpu(sd_asym_packing, cpu));
 	if (sd) {
+		/*
+		 * When ASYM_PACKING; see if there's a more preferred CPU going
+		 * idle; in which case, kick the ILB to move tasks around.
+		 */
 		for_each_cpu_and(i, sched_domain_span(sd), nohz.idle_cpus_mask) {
 			if (sched_asym_prefer(i, cpu)) {
 				flags = NOHZ_KICK_MASK;