lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220910105326.1797-6-kprateek.nayak@amd.com>
Date:   Sat, 10 Sep 2022 16:23:26 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     <linux-kernel@...r.kernel.org>
CC:     <aubrey.li@...ux.intel.com>, <efault@....de>,
        <gautham.shenoy@....com>, <libo.chen@...cle.com>,
        <mgorman@...hsingularity.net>, <mingo@...nel.org>,
        <peterz@...radead.org>, <song.bao.hua@...ilicon.com>,
        <srikar@...ux.vnet.ibm.com>, <tglx@...utronix.de>,
        <valentin.schneider@....com>, <vincent.guittot@...aro.org>,
        <wuyun.abel@...edance.com>, <wyes.karny@....com>,
        <yu.c.chen@...el.com>, <yangyicong@...wei.com>
Subject: [PATCH 5/5] sched/fair: Add exception for hints in load balancing path

- Load balancing considerations

If we have more tasks than the CPUs in the MC Domain, ignore the hint
set by the user. This prevents losing the consolidation done at the
wakeup time.

- Considerations

Few trial and errors were done to find a good threshold to ignore hints.
Following are some of the wins and woes:

o Ignore hint if MC domain of src CPU does not have an idle core: This
  metric is not very accurate and led to losing consolidation early on.
o Ignore hint if sd_shared->nr_llc_scan is 0: This too, like the
  has_idle core metric was not always accurate.
o An atomic read of sd_shared->nr_busy_cpus doesn't encapsulate
  overloaded run queues.

Best results were found by scanning LLC and finding the number of
running tasks and comparing it with size of LLC. If the LLC is beyond
fully loaded, safely ignore hint.

- Possible Improvements

o Consider the status of hint: If a wake affine hint was ignored in
  the wakeup path, consider ignoring in the load balancer path as well
  as the running LLC is not the desired LLC in fact.

Signed-off-by: K Prateek Nayak <kprateek.nayak@....com>
---
 kernel/sched/fair.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4c61bd0e93b3..8e1679b784fb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7810,6 +7810,9 @@ struct lb_env {
 	unsigned int		loop_break;
 	unsigned int		loop_max;
 
+	/* Indicator to ignore hint if LLC is overloaded */
+	int			ignore_hint;
+
 	enum fbq_type		fbq_type;
 	enum migration_type	migration_type;
 	struct list_head	tasks;
@@ -7977,6 +7980,21 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 		return 0;
 	}
 
+	/*
+	 * Hints are followed only if the MC Domain is still ideal
+	 * for the task.
+	 */
+	if (!env->ignore_hint) {
+		/*
+		 * Only consider the hints from the wakeup path to maintain
+		 * data locality.
+		 */
+		if (READ_ONCE(p->hint) &
+		    (PR_SCHED_HINT_WAKE_AFFINE | PR_SCHED_HINT_WAKE_HOLD))
+			return 0;
+	}
+
+
 	/* Record that we found at least one task that could run on dst_cpu */
 	env->flags &= ~LBF_ALL_PINNED;
 
@@ -10182,6 +10200,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 		.cpus		= cpus,
 		.fbq_type	= all,
 		.tasks		= LIST_HEAD_INIT(env.tasks),
+		.ignore_hint	= 1,
 	};
 
 	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
@@ -10213,6 +10232,30 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	env.src_cpu = busiest->cpu;
 	env.src_rq = busiest;
 
+	/*
+	 * Check if the hints can be followed during
+	 * this load balancing cycle.
+	 */
+	if (!(sd->flags & SD_SHARE_PKG_RESOURCES)) {
+		struct sched_domain *src_sd_llc = rcu_dereference(per_cpu(sd_llc, env.src_cpu));
+
+		if (src_sd_llc) {
+			int cpu, nr_llc_running = 0, llc_size = per_cpu(sd_llc_size, env.src_cpu);
+
+			for_each_cpu_wrap(cpu, sched_domain_span(src_sd_llc), env.src_cpu) {
+				struct rq *rq = cpu_rq(cpu);
+				nr_llc_running += rq->nr_running - rq->cfs.idle_h_nr_running;
+			}
+
+			/*
+			 * Don't ignore hint if we can have one task
+			 * per CPU in the LLC of the src_cpu.
+			 */
+			if (nr_llc_running <= llc_size)
+				env.ignore_hint = 0;
+		}
+	}
+
 	ld_moved = 0;
 	/* Clear this flag as soon as we find a pullable task */
 	env.flags |= LBF_ALL_PINNED;
@@ -10520,6 +10563,7 @@ static int active_load_balance_cpu_stop(void *data)
 			.src_rq		= busiest_rq,
 			.idle		= CPU_IDLE,
 			.flags		= LBF_ACTIVE_LB,
+			.ignore_hint	= sd->flags & SD_SHARE_PKG_RESOURCES,
 		};
 
 		schedstat_inc(sd->alb_count);
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ