lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230207045838.11243-7-ricardo.neri-calderon@linux.intel.com>
Date:   Mon,  6 Feb 2023 20:58:34 -0800
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ricardo Neri <ricardo.neri@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Ionela Voinescu <ionela.voinescu@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org,
        Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
        "Tim C . Chen" <tim.c.chen@...el.com>
Subject: [PATCH v3 06/10] sched/fair: Use the prefer_sibling flag of the current sched domain

SD_PREFER_SIBLING is set from the SMT scheduling domain up to the first
non-NUMA domain (the exception is systems with SD_ASYM_CPUCAPACITY).

Above the SMT sched domain, all domains have a child. The SD_PREFER_
SIBLING is honored always regardless of the scheduling domain at which the
load balance takes place.

There are cases, however, in which the busiest CPU's sched domain has
child but the destination CPU's does not. Consider, for instance a non-SMT
core (or an SMT core with only one online sibling) doing load balance with
an SMT core at the MC level. SD_PREFER_SIBLING will not be honored. We are
left with a fully busy SMT core and an idle non-SMT core.

Avoid inconsistent behavior. Use the prefer_sibling behavior at the current
scheduling domain, not its child.

The NUMA sched domain does not have the SD_PREFER_SIBLING flag. Thus, we
will not spread load among NUMA sched groups, as desired.

Cc: Ben Segall <bsegall@...gle.com>
Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Len Brown <len.brown@...el.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Tim C. Chen <tim.c.chen@...el.com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: x86@...nel.org
Cc: linux-kernel@...r.kernel.org
Suggested-by: Valentin Schneider <vschneid@...hat.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
---
Changes since v2:
 * Introduced this patch.

Changes since v1:
 * N/A
---
 kernel/sched/fair.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df7bcbf634a8..a37ad59f20ea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10004,7 +10004,6 @@ static void update_idle_cpu_scan(struct lb_env *env,
 
 static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds)
 {
-	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
 	struct sg_lb_stats *local = &sds->local_stat;
 	struct sg_lb_stats tmp_sgs;
@@ -10045,9 +10044,11 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		sg = sg->next;
 	} while (sg != env->sd->groups);
 
-	/* Tag domain that child domain prefers tasks go to siblings first */
-	sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
-
+	/*
+	 * Tag domain that @env::sd prefers to spread excess tasks among
+	 * sibling sched groups.
+	 */
+	sds->prefer_sibling = env->sd->flags & SD_PREFER_SIBLING;
 
 	if (env->sd->flags & SD_NUMA)
 		env->fbq_type = fbq_classify_group(&sds->busiest_stat);
@@ -10346,7 +10347,6 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 			goto out_balanced;
 	}
 
-	/* Try to move all excess tasks to child's sibling domain */
 	if (sds.prefer_sibling && local->group_type == group_has_spare &&
 	    busiest->sum_nr_running > local->sum_nr_running + 1)
 		goto force_balance;
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ