[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230814132935.GK776869@hirez.programming.kicks-ass.net>
Date: Mon, 14 Aug 2023 15:29:35 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Chen Yu <yu.c.chen@...el.com>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org,
Ingo Molnar <mingo@...nel.org>,
Mike Galbraith <umgwanakikbuti@...il.com>
Subject: Re: [tip:sched/eevdf] [sched/fair] e0c2ff903c:
phoronix-test-suite.blogbench.Write.final_score -34.8% regression
On Fri, Aug 11, 2023 at 10:42:09AM +0800, Chen Yu wrote:
> Since previously lkp has reported that with eevdf policy enabled, there was
> a regression in hackbench, I did some experiments and found that, with eevdf
> enabled there are more preemptions, and this preemption could slow down
> the waker(each waker could wakes up 20 wakee in hackbench). The reason might
> be that, check_preempt_wakeup() is easier to preempt the current task in eevdf:
This is true.
> Without eevdf enabled, the /proc/schedstat delta within 5 seconds on CPU8 is:
> Thu Aug 10 11:02:02 2023 cpu8
> .stats.check_preempt_count 51973 <-----
> .stats.need_preempt_count 10514 <-----
> .stats.rq_cpu_time 5004068598
> .stats.rq_sched_info.pcount 60374
> .stats.rq_sched_info.run_delay 80405664582
> .stats.sched_count 60609
> .stats.sched_goidle 227
> .stats.ttwu_count 56250
> .stats.ttwu_local 14619
>
> The preemption success ration is 10514 / 51973 = 20.23%
> -----------------------------------------------------------------------------
>
> With eevdf enabled, the /proc/schedstat delta within 5 seconds on CPU8 is:
> Thu Aug 10 10:22:55 2023 cpu8
> .stats.check_preempt_count 71673 <----
> .stats.low_gran_preempt_count 57410
> .stats.need_preempt_count 57413 <----
> .stats.rq_cpu_time 5007778990
> .stats.rq_sched_info.pcount 129233
> .stats.rq_sched_info.run_delay 164830921362
> .stats.sched_count 129233
> .stats.ttwu_count 70222
> .stats.ttwu_local 66847
>
> The preemption success ration is 57413 / 71673 = 80.10%
note: wakeup-preemption
> According to the low_gran_preempt_count, most successfully preemption happens
> when the current->vruntime is smaller than wakee->vruntime + sysctl_sched_wakeup_granularity,
> which will not happen in current cfs's wakeup_preempt_entity().
>
> It seems that, eevdf does not inhit the wakeup preemption as much as cfs, and
> maybe it is because eevdf needs to consider fairness more?
Not fairness, latency. Because it wants to honour the virtual deadline.
Are these wakeup preemptions typically on runqueues that have only a
single other task?
That is, consider a single task running, then avg_vruntime will be it's
vruntime, because the average of one variable must be the value of that
one variable.
Then the moment a second task joins, we get two options:
- positive lag
- negative lag
When the new task has negative lag, it gets placed to the right of the
currently running task (and avg_vruntime has a forward discontinuity).
At this point the new task is not eligible and does not get to run.
When the new task has positive lag, it gets placed to the left of the
currently running task (and avg_vruntime has a backward discontinuity).
At this point the currently running task is no longer eligible, and the
new task must be selected -- irrespective of it's deadline.
The paper doesn't (AFAIR) consider the case of wake-up-preemption
explicitly. It only considers task selection and vruntime placement.
One option I suppose would be to gate the wakeup preemption by virtual
deadline, only allow when the new task has an earlier deadline than the
currently running one, and otherwise rely on tick preemption.
NOTE: poking at wakeup preemption is a double edged sword, some
workloads love it, some hate it. Touching it is bound to upset the
balance -- again.
(also, did I get that the right way around? -- I've got a Monday brain
that isn't willing to boot properly)
---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fe5be91c71c7..16d24e5dda8f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8047,6 +8047,15 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
cfs_rq = cfs_rq_of(se);
update_curr(cfs_rq);
+ if (sched_feat(WAKEUP_DEADLINE)) {
+ /*
+ * Only allow preemption if the virtual deadline of the new
+ * task is before the virtual deadline of the existing task.
+ */
+ if (deadline_gt(deadline, pse, se))
+ return;
+ }
+
/*
* XXX pick_eevdf(cfs_rq) != se ?
*/
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 61bcbf5e46a4..e733981b32aa 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -24,6 +24,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true)
* Allow wakeup-time preemption of the current task:
*/
SCHED_FEAT(WAKEUP_PREEMPTION, true)
+SCHED_FEAT(WAKEUP_DEADLINE, true)
SCHED_FEAT(HRTICK, false)
SCHED_FEAT(HRTICK_DL, false)
Powered by blists - more mailing lists