linux-kernel - Re: [PATCH] sched: Forward deadline for early tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241127070534.62903-1-zhouzihan30@jd.com>
Date: Wed, 27 Nov 2024 15:05:35 +0800
From: zhouzihan30 <15645113830zzh@...il.com>
To: peterz@...radead.org
Cc: 15645113830zzh@...il.com,
	bsegall@...gle.com,
	dietmar.eggemann@....com,
	juri.lelli@...hat.com,
	linux-kernel@...r.kernel.org,
	mgorman@...e.de,
	mingo@...hat.com,
	rostedt@...dmis.org,
	vincent.guittot@...aro.org,
	vschneid@...hat.com,
	yaozhenguo@...com,
	zhouzihan30@...com
Subject: Re: [PATCH] sched: Forward deadline for early tick

From: zhouzihan <zhouzihan30@...com>

Thank you very much for your reply, which has brought me lots
 of thoughts.

I have reconsidered this issue and believe that the root cause is that
 the kernel is difficult and unnecessary to implement an ideal eevdf
 due to real hardware:
for example,
an ideal eevdf may bring frequent switching, its cost makes kernel can't
 really do it.

So I see that the kernel has a very concise and clever implementation:
 update_deadline, which allows each task to use up the request size
 as much as possible in one go.

Here, the kernel has actually slightly violated eevdf: we are no longer
 concerned about whether a task is eligible for the time being.

In the prev patch, it was mentioned that due to tick errors, some tasks
 run longer than requested. So if we can do this: when a task vruntime
 approaches the deadline, we check if the task is eligible.
If it is eligible, we follow the previous logic and do not schedule it.
However, if it is ineligible, we schedule it because eevdf has the
 responsibility to not exec ineligible task.

In other words, the kernel has given the task a "benefit" based on the
 actual situation, and we still have the right to revoke this benefit.

In this way, it actually brings the kernel closer to an ideal eevdf,
and at the same time, your reply made me realize my mistake:
The deadline update should be updated using the following function,
which is more reasonable:
    vd_i += r_i / w_i
By using it, our scheduler is still fair,
 and each task can obtain its own time according to its weight.

About tolerance, I think min(vslice/128, tick/2) is better,
as your reply, vslice maybe too big, so use it.

However, there is still a new issue here:
If a se 1 terminates its deadline prematurely due to ineligible,
and then a se 2 runs, after the end of the run,
se 1 becomes eligible, but its deadline has already been pushed back,
which is not earliest eligible,
so se 1 can't run. This seems to not comply with eevdf specifications.

But I think this is acceptable. In the past, the logic causes a task to
 run an extra tick (ms), which means other tasks have to wait for an
 extra tick. Now, a task maybe run less time (us), but it will be
 compensated for in the next scheduling. In terms of overall accuracy,
 I think it has improved.

By the way, we may try to solve this by delaying the deadline update,
which means we schedule a task but not update its deadline,
util next exec it.
I am not sure if it is necessary to implement such complex logic here.
I think it is actually unnecessary,
because the ideal eevdf may not be suitable for the kernel.
It is no need to spend so much to solve the error of few time.
If there is a truly completely accurate system, it should not have
 tick error, and just closes the FORWARD_DEADLINE feature.
Of course, if you think it is necessary, I am willing try to implement it.

Signed-off-by: zhouzihan30 <zhouzihan30@...com>
---
 kernel/sched/fair.c     | 41 +++++++++++++++++++++++++++++++++++++----
 kernel/sched/features.h |  7 +++++++
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d16c8545c71..e6e58c7d6d4c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1006,8 +1006,10 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
  */
 static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	if ((s64)(se->vruntime - se->deadline) < 0)
-		return false;
+
+	u64 vslice;
+	u64 tolerance = 0;
+	u64 next_deadline;
 
 	/*
 	 * For EEVDF the virtual time slope is determined by w_i (iow.
@@ -1016,11 +1018,42 @@ static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	 */
 	if (!se->custom_slice)
 		se->slice = sysctl_sched_base_slice;
+	vslice = calc_delta_fair(se->slice, se);
+
+	next_deadline = se->vruntime + vslice;
+
+	if (sched_feat(FORWARD_DEADLINE))
+		tolerance = min(vslice>>7, TICK_NSEC/2);
+
+	if ((s64)(se->vruntime + tolerance - se->deadline) < 0)
+		return false;
 
 	/*
-	 * EEVDF: vd_i = ve_i + r_i / w_i
+	 * when se->vruntime + tolerance - se->deadline >= 0
+	 * but se->vruntime - se->deadline < 0,
+	 * there is two case: if entity is eligible?
+	 * if entity is not eligible, we don't need wait deadline, because
+	 * eevdf don't guarantee
+	 * an ineligible entity can exec its request time in one go.
+	 * but when entity eligible, just let it run, which is the
+	 * same processing logic as before.
 	 */
-	se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
+
+	if (sched_feat(FORWARD_DEADLINE) && (s64)(se->vruntime - se->deadline) < 0) {
+		if (entity_eligible(cfs_rq, se)) {
+			return false;
+		} else {
+			/*
+			 * in this case, entity's request size does not use light，
+			 * but considering it is not eligible, we don't need exec it.
+			 * and we let vd_i += r_i / w_i, make scheduler fairness.
+			 */
+			next_deadline = se->deadline + vslice;
+		}
+	}
+
+
+	se->deadline = next_deadline;
 
 	/*
 	 * The task has consumed its request, reschedule.
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 290874079f60..5c74deec7209 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -24,6 +24,13 @@ SCHED_FEAT(RUN_TO_PARITY, true)
  */
 SCHED_FEAT(PREEMPT_SHORT, true)
 
+/*
+ * For some cases where the tick is faster than expected,
+ * move the deadline forward
+ */
+SCHED_FEAT(FORWARD_DEADLINE, true)
+
+
 /*
  * Prefer to schedule the task we woke last (assuming it failed
  * wakeup-preemption), since its likely going to consume data we
-- 
2.39.3 (Apple Git-146)