[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20241127070534.62903-1-zhouzihan30@jd.com>
Date: Wed, 27 Nov 2024 15:05:35 +0800
From: zhouzihan30 <15645113830zzh@...il.com>
To: peterz@...radead.org
Cc: 15645113830zzh@...il.com,
bsegall@...gle.com,
dietmar.eggemann@....com,
juri.lelli@...hat.com,
linux-kernel@...r.kernel.org,
mgorman@...e.de,
mingo@...hat.com,
rostedt@...dmis.org,
vincent.guittot@...aro.org,
vschneid@...hat.com,
yaozhenguo@...com,
zhouzihan30@...com
Subject: Re: [PATCH] sched: Forward deadline for early tick
From: zhouzihan <zhouzihan30@...com>
Thank you very much for your reply, which has brought me lots
of thoughts.
I have reconsidered this issue and believe that the root cause is that
the kernel is difficult and unnecessary to implement an ideal eevdf
due to real hardware:
for example,
an ideal eevdf may bring frequent switching, its cost makes kernel can't
really do it.
So I see that the kernel has a very concise and clever implementation:
update_deadline, which allows each task to use up the request size
as much as possible in one go.
Here, the kernel has actually slightly violated eevdf: we are no longer
concerned about whether a task is eligible for the time being.
In the prev patch, it was mentioned that due to tick errors, some tasks
run longer than requested. So if we can do this: when a task vruntime
approaches the deadline, we check if the task is eligible.
If it is eligible, we follow the previous logic and do not schedule it.
However, if it is ineligible, we schedule it because eevdf has the
responsibility to not exec ineligible task.
In other words, the kernel has given the task a "benefit" based on the
actual situation, and we still have the right to revoke this benefit.
In this way, it actually brings the kernel closer to an ideal eevdf,
and at the same time, your reply made me realize my mistake:
The deadline update should be updated using the following function,
which is more reasonable:
vd_i += r_i / w_i
By using it, our scheduler is still fair,
and each task can obtain its own time according to its weight.
About tolerance, I think min(vslice/128, tick/2) is better,
as your reply, vslice maybe too big, so use it.
However, there is still a new issue here:
If a se 1 terminates its deadline prematurely due to ineligible,
and then a se 2 runs, after the end of the run,
se 1 becomes eligible, but its deadline has already been pushed back,
which is not earliest eligible,
so se 1 can't run. This seems to not comply with eevdf specifications.
But I think this is acceptable. In the past, the logic causes a task to
run an extra tick (ms), which means other tasks have to wait for an
extra tick. Now, a task maybe run less time (us), but it will be
compensated for in the next scheduling. In terms of overall accuracy,
I think it has improved.
By the way, we may try to solve this by delaying the deadline update,
which means we schedule a task but not update its deadline,
util next exec it.
I am not sure if it is necessary to implement such complex logic here.
I think it is actually unnecessary,
because the ideal eevdf may not be suitable for the kernel.
It is no need to spend so much to solve the error of few time.
If there is a truly completely accurate system, it should not have
tick error, and just closes the FORWARD_DEADLINE feature.
Of course, if you think it is necessary, I am willing try to implement it.
Signed-off-by: zhouzihan30 <zhouzihan30@...com>
---
kernel/sched/fair.c | 41 +++++++++++++++++++++++++++++++++++++----
kernel/sched/features.h | 7 +++++++
2 files changed, 44 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d16c8545c71..e6e58c7d6d4c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1006,8 +1006,10 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
*/
static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- if ((s64)(se->vruntime - se->deadline) < 0)
- return false;
+
+ u64 vslice;
+ u64 tolerance = 0;
+ u64 next_deadline;
/*
* For EEVDF the virtual time slope is determined by w_i (iow.
@@ -1016,11 +1018,42 @@ static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
*/
if (!se->custom_slice)
se->slice = sysctl_sched_base_slice;
+ vslice = calc_delta_fair(se->slice, se);
+
+ next_deadline = se->vruntime + vslice;
+
+ if (sched_feat(FORWARD_DEADLINE))
+ tolerance = min(vslice>>7, TICK_NSEC/2);
+
+ if ((s64)(se->vruntime + tolerance - se->deadline) < 0)
+ return false;
/*
- * EEVDF: vd_i = ve_i + r_i / w_i
+ * when se->vruntime + tolerance - se->deadline >= 0
+ * but se->vruntime - se->deadline < 0,
+ * there is two case: if entity is eligible?
+ * if entity is not eligible, we don't need wait deadline, because
+ * eevdf don't guarantee
+ * an ineligible entity can exec its request time in one go.
+ * but when entity eligible, just let it run, which is the
+ * same processing logic as before.
*/
- se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
+
+ if (sched_feat(FORWARD_DEADLINE) && (s64)(se->vruntime - se->deadline) < 0) {
+ if (entity_eligible(cfs_rq, se)) {
+ return false;
+ } else {
+ /*
+ * in this case, entity's request size does not use light,
+ * but considering it is not eligible, we don't need exec it.
+ * and we let vd_i += r_i / w_i, make scheduler fairness.
+ */
+ next_deadline = se->deadline + vslice;
+ }
+ }
+
+
+ se->deadline = next_deadline;
/*
* The task has consumed its request, reschedule.
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 290874079f60..5c74deec7209 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -24,6 +24,13 @@ SCHED_FEAT(RUN_TO_PARITY, true)
*/
SCHED_FEAT(PREEMPT_SHORT, true)
+/*
+ * For some cases where the tick is faster than expected,
+ * move the deadline forward
+ */
+SCHED_FEAT(FORWARD_DEADLINE, true)
+
+
/*
* Prefer to schedule the task we woke last (assuming it failed
* wakeup-preemption), since its likely going to consume data we
--
2.39.3 (Apple Git-146)
Powered by blists - more mailing lists