linux-kernel - [PATCH V2 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250207061359.31442-1-15645113830zzh@gmail.com>
Date: Fri,  7 Feb 2025 14:14:00 +0800
From: zihan zhou <15645113830zzh@...il.com>
To: 15645113830zzh@...il.com
Cc: bsegall@...gle.com,
	dietmar.eggemann@....com,
	juri.lelli@...hat.com,
	linux-kernel@...r.kernel.org,
	mgorman@...e.de,
	mingo@...hat.com,
	peterz@...radead.org,
	rostedt@...dmis.org,
	vincent.guittot@...aro.org,
	vschneid@...hat.com
Subject: [PATCH V2 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick

Reduce the default slice, add a comment explaining why this modification
was made.

Signed-off-by: zihan zhou <15645113830zzh@...il.com>
---
 kernel/sched/fair.c | 47 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 26958431deb7..754b0785eaa0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -71,10 +71,49 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
 /*
  * Minimal preemption granularity for CPU-bound tasks:
  *
- * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
- */
-unsigned int sysctl_sched_base_slice			= 750000ULL;
-static unsigned int normalized_sysctl_sched_base_slice	= 750000ULL;
+ * (default: 0.70 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ *
+ * The old default value for slice is 0.75 msec * (1 + ilog(ncpus)) which
+ * means that we have a default slice of
+ * 0.75 for 1 cpu
+ * 1.50 up to 3 cpus
+ * 2.25 up to 7 cpus
+ * 3.00 for 8 cpus and above.
+ *
+ * For HZ=250 and HZ=100, because of the tick accuracy, the runtime of tasks
+ * is far higher than their slice.
+ * For HZ=1000 with 8 cpus or more, the accuracy of tick is already
+ * satisfactory, but there is still an issue that tasks will get an extra
+ * tick because the tick often arrives a little faster than expected. In this
+ * case, the task can only wait until the next tick to consider that it has
+ * reached its deadline, and will run 1ms longer.
+ *
+ * vruntime + sysctl_sched_base_slice =     deadline
+ *         |-----------|-----------|-----------|-----------|
+ *              1ms          1ms         1ms         1ms
+ *                    ^           ^           ^           ^
+ *                  tick1       tick2       tick3       tick4(nearly 4ms)
+ *
+ * There are two reasons for tick error: clockevent precision and the
+ * CONFIG_IRQ_TIME_ACCOUNTING/CONFIG_PARAVIRT_TIME_ACCOUNTING.
+ * with CONFIG_IRQ_TIME_ACCOUNTING every tick will be less than 1ms, but even
+ * without it, because of clockevent precision, tick still often less than
+ * 1ms.
+ *
+ * In order to make scheduling more precise, we changed 0.75 to 0.70,
+ * Using 0.70 instead of 0.75 should not change much for other configs
+ * and would fix this issue:
+ * 0.70 for 1 cpu
+ * 1.40 up to 3 cpus
+ * 2.10 up to 7 cpus
+ * 2.8 for 8 cpus and above.
+ *
+ * This does not guarantee that tasks can run the slice time accurately every
+ * time, but occasionally running an extra tick has little impact.
+ *
+ */
+unsigned int sysctl_sched_base_slice			= 700000ULL;
+static unsigned int normalized_sysctl_sched_base_slice	= 700000ULL;
 
 const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;
 
-- 
2.33.0