lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230406203148.19182-7-ricardo.neri-calderon@linux.intel.com>
Date:   Thu,  6 Apr 2023 13:31:42 -0700
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ricardo Neri <ricardo.neri@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Ionela Voinescu <ionela.voinescu@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org,
        Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
        "Tim C . Chen" <tim.c.chen@...el.com>
Subject: [PATCH v4 06/12] sched/fair: Use the busiest group to set prefer_sibling

The prefer_sibling setting acts on the busiest group to move excess tasks
to the local group. This should be done as per request of the child of the
busiest group's sched domain, not the local group's.

Using the flags of the child domain of the local group works fortuitously
if both groups have child domains.

There are cases, however, in which the busiest group's sched domain has
child but the local group's does not. Consider, for instance a non-SMT
core (or an SMT core with only one online sibling) doing load balance with
an SMT core at the MC level. SD_PREFER_SIBLING of the busiest group's child
domain will not be honored. We are left with a fully busy SMT core and an
idle non-SMT core.

Cc: Ben Segall <bsegall@...gle.com>
Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Ionela Voinescu <ionela.voinescu@....com>
Cc: Len Brown <len.brown@...el.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Tim C. Chen <tim.c.chen@...el.com>
Cc: Valentin Schneider <vschneid@...hat.com>
Cc: x86@...nel.org
Cc: linux-kernel@...r.kernel.org
Tested-by: Zhang Rui <rui.zhang@...el.com>
Suggested-by: Dietmar Eggemann <dietmar.eggemann@....com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
---
Changes since v3:
 * Dropped patch that set prefer_sibling from the current sched domain
   level. Instead, uses the flags of the busiest group's child sched
   domain. (Vincent G, PeterZ, Dietmar)

Changes since v2:
 * Introduced this patch.

Changes since v1:
 * N/A
---
 kernel/sched/fair.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ea23a5163bfa..477cb5777036 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10056,7 +10056,6 @@ static void update_idle_cpu_scan(struct lb_env *env,
 
 static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds)
 {
-	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
 	struct sg_lb_stats *local = &sds->local_stat;
 	struct sg_lb_stats tmp_sgs;
@@ -10097,8 +10096,13 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		sg = sg->next;
 	} while (sg != env->sd->groups);
 
-	/* Tag domain that child domain prefers tasks go to siblings first */
-	sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
+	/*
+	 * Indicate that the child domain of the busiest group prefers tasks
+	 * go to a child's sibling domains first. NB the flags of a sched group
+	 * are those of the child domain.
+	 */
+	if (sds->busiest)
+		sds->prefer_sibling = !!(sds->busiest->flags & SD_PREFER_SIBLING);
 
 
 	if (env->sd->flags & SD_NUMA)
@@ -10398,7 +10402,10 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 			goto out_balanced;
 	}
 
-	/* Try to move all excess tasks to child's sibling domain */
+	/*
+	 * Try to move all excess tasks to a sibling domain of the busiest
+	 * group's child domain.
+	 */
 	if (sds.prefer_sibling && local->group_type == group_has_spare &&
 	    busiest->sum_nr_running > local->sum_nr_running + 1)
 		goto force_balance;
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ