linux-kernel - [PATCH 2/3] sched: Unloop sched avg decaying

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1465918082-27005-3-git-send-email-fweisbec@gmail.com>
Date:	Tue, 14 Jun 2016 17:28:01 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...nel.org>, Mike Galbraith <efault@....de>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: [PATCH 2/3] sched: Unloop sched avg decaying

The code that computes sched rt avg decaying periodically (every 0.5
seconds) catches up with lost updates within a loop. This is ok when
CPUs sleep in dynticks mode for short periods of time as they only
miss a few updates that are caught up through a few iterations. But CPUs
can sleep for undefined periods of time, leading to unbound number of
loop iterations to catch up. As a result, the computation time of
sched_avg_update() can virtually induce some latency issue.

The solution in this patch propose to convert this loop into a single
division. Testing on idle load has shown positive results. Below
is showned a function profile comparison between the upstream version
(renamed as sched_avg_update_old) against the new version
(sched_avg_update) with both being called alternatively.

New version can be up to 2 times faster on average.

	Function                                         Hit    Time            Avg             s^2
	--------                                         ---    ----            ---             ---
	trace_stat/function0:  sched_avg_update_old   135120    47612.36 us     0.352 us        1049.606 us
	trace_stat/function0:  sched_avg_update       135120    42668.44 us     0.315 us        1617.970 us
	trace_stat/function1:  sched_avg_update_old   132074    46618.41 us     0.352 us        3653.566 us
	trace_stat/function1:  sched_avg_update       132074    41356.76 us     0.313 us        3161.585 us
	trace_stat/function2:  sched_avg_update_old   121335    43381.10 us     0.357 us        1262.240 us
	trace_stat/function2:  sched_avg_update       121335    38240.72 us     0.315 us        970.400 us
	trace_stat/function3:  sched_avg_update_old    37563    17640.67 us     0.469 us        148.206 us
	trace_stat/function3:  sched_avg_update        37563    11059.65 us     0.294 us        126.548 us
	trace_stat/function4:  sched_avg_update_old    24200    13557.05 us     0.560 us        57.124 us
	trace_stat/function4:  sched_avg_update        24200    6687.281 us     0.276 us        9.528 us
	trace_stat/function5:  sched_avg_update_old    28572    15158.76 us     0.530 us        674.049 us
	trace_stat/function5:  sched_avg_update        28573    8012.361 us     0.280 us        181.687 us
	trace_stat/function6:  sched_avg_update_old    23424    12987.64 us     0.554 us        27.639 us
	trace_stat/function6:  sched_avg_update        23425    6264.433 us     0.267 us        2.965 us
	trace_stat/function7:  sched_avg_update_old    22192    13083.69 us     0.589 us        25.605 us
	trace_stat/function7:  sched_avg_update        22191    5947.785 us     0.268 us        1.748 us

Below is a snapshot of the same function profile under full buzy load
(perf bench sched messaging). Old and new versions there show roughly
the same results with tiny 2 or 3 nsecs in favour of the old version
which shouldn't matter much as it is called every 0.5 seconds.

	Function                                         Hit    Time            Avg             s^2
	--------                                         ---    ----            ---             ---
	trace_stat/function0:  sched_avg_update        106699    8029.961 us     0.075 us        0.244 us
	trace_stat/function0:  sched_avg_update_old    106698    7852.948 us     0.073 us        0.313 us
	trace_stat/function1:  sched_avg_update        106547    8066.833 us     0.075 us        1.256 us
	trace_stat/function1:  sched_avg_update_old    106547    7794.896 us     0.073 us        1.521 us
	trace_stat/function2:  sched_avg_update        106527    8049.326 us     0.075 us        1.141 us
	trace_stat/function2:  sched_avg_update_old    106528    7818.155 us     0.073 us        1.052 us
	trace_stat/function3:  sched_avg_update        106534    8056.079 us     0.075 us        0.342 us
	trace_stat/function3:  sched_avg_update_old    106535    7815.416 us     0.073 us        0.369 us
	trace_stat/function4:  sched_avg_update        106433    8090.462 us     0.076 us        5.359 us
	trace_stat/function4:  sched_avg_update_old    106433    7879.694 us     0.074 us        0.433 us
	trace_stat/function5:  sched_avg_update        106426    8127.800 us     0.076 us        1.304 us
	trace_stat/function5:  sched_avg_update_old    106425    7854.538 us     0.073 us        2.466 us
	trace_stat/function6:  sched_avg_update        106436    8067.921 us     0.075 us        0.257 us
	trace_stat/function6:  sched_avg_update_old    106436    7830.492 us     0.073 us        0.334 us
	trace_stat/function7:  sched_avg_update        106427    8135.786 us     0.076 us        13.181 us
	trace_stat/function7:  sched_avg_update_old    106428    7940.925 us     0.074 us        0.982 us

As a conclusion, the new version behaves roughly the same under buzy
load but has a faster and much more contained behaviour under idle load.

Cc: Mike Galbraith <efault@....de>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...nel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@...il.com>
---
 kernel/sched/core.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 385c947..0c0578a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
 void sched_avg_update(struct rq *rq)
 {
 	s64 period = sched_avg_period();
+	s64 delta;
+	u64 rem;
+	int pending;
 
-	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
-		/*
-		 * Inline assembly required to prevent the compiler
-		 * optimising this loop into a divmod call.
-		 * See __iter_div_u64_rem() for another example of this.
-		 */
-		asm("" : "+rm" (rq->age_stamp));
-		rq->age_stamp += period;
-		rq->rt_avg /= 2;
-	}
+	delta = (s64)(rq_clock(rq) - rq->age_stamp);
+	if (delta <= period)
+		return;
+
+	pending = div64_u64_rem(delta, period, &rem);
+	rq->age_stamp += delta - rem;
+	rq->rt_avg >>= pending;
 }
 
 #endif /* CONFIG_SMP */
-- 
2.7.0