linux-kernel - [PATCH 3/4] sched/fair: Do not migrate if the prev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 30 Jan 2018 10:45:54 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Mike Galbraith <efault@....de>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...hsingularity.net>
Subject: [PATCH 3/4] sched/fair: Do not migrate if the prev_cpu is idle

wake_affine_idle prefers to move a task to the current CPU if the
wakeup is due to an interrupt. The expectation is that the interrupt
data is cache hot and relevant to the waking task as well as avoiding
a search. However, there is no way to determine if there was cache hot
data on the previous CPU that may exceed the interrupt data. Furthermore,
round-robin delivery of interrupts can migrate tasks around a socket where
each CPU is under-utilised.  This can interact badly with cpufreq which
makes decisions based on per-cpu data. It has been observed on machines
with HWP that p-states are not boosted to their maximum levels even though
the workload is latency and throughput sensitive.

This patch uses the previous CPU for the task if it's idle and cache-affine
with the current CPU even if the current CPU is idle due to the wakup
being related to the interrupt. This reduces migrations at the cost of
the interrupt data not being cache hot when the task wakes.

A variety of workloads were tested on various machines and no adverse
impact was noticed that was outside noise. dbench on ext4 on UMA showed
roughly 10% reduction in the number of CPU migrations and it is a case
where interrupts are frequent for IO competions. In most cases, the
difference in performance is quite small but variability is often
reduced. For example, this is the result for pgbench running on a UMA
machine with different numbers of clients.

                         4.15.0-rc9             4.15.0-rc9
                           baseline              waprev-v1
Hmean     1     22096.28 (   0.00%)    22734.86 (   2.89%)
Hmean     4     74633.42 (   0.00%)    75496.77 (   1.16%)
Hmean     7    115017.50 (   0.00%)   113030.81 (  -1.73%)
Hmean     12   126209.63 (   0.00%)   126613.40 (   0.32%)
Hmean     16   131886.91 (   0.00%)   130844.35 (  -0.79%)
Stddev    1       636.38 (   0.00%)      417.11 (  34.46%)
Stddev    4       614.64 (   0.00%)      583.24 (   5.11%)
Stddev    7       542.46 (   0.00%)      435.45 (  19.73%)
Stddev    12      173.93 (   0.00%)      171.50 (   1.40%)
Stddev    16      671.42 (   0.00%)      680.30 (  -1.32%)
CoeffVar  1         2.88 (   0.00%)        1.83 (  36.26%)

Note that the different in performance is marginal but for low utilisation,
there is less variability.

Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
---
 kernel/sched/fair.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1aebe79da2ab..3b732caa6fba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5704,9 +5704,15 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
 	 * context. Only allow the move if cache is shared. Otherwise an
 	 * interrupt intensive workload could force all tasks onto one
 	 * node depending on the IO topology or IRQ affinity settings.
+	 *
+	 * If the prev_cpu is idle and cache affine then avoid a migration.
+	 * There is no guarantee that the cache hot data from an interrupt
+	 * is more important than cache hot data on the prev_cpu and from
+	 * a cpufreq perspective, it's better to have higher utilisation
+	 * on one CPU.
 	 */
 	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
-		return this_cpu;
+		return idle_cpu(prev_cpu) ? prev_cpu : this_cpu;

 	if (sync && cpu_rq(this_cpu)->nr_running == 1)
 		return this_cpu;
-- 
2.15.1