[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADHxFxQ8GqndiKGT2z2aUFU5qQQSU1QxQR1CrHsaa8ShrJ6D+Q@mail.gmail.com>
Date: Thu, 5 Jun 2025 22:50:15 +0800
From: hupu <hupu.gm@...il.com>
To: John Stultz <jstultz@...gle.com>
Cc: peterz@...radead.org, linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
vschneid@...hat.com, mingo@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, hupu@...nssion.com
Subject: Re: [RFC 1/1] sched: Skip redundant operations for proxy tasks
needing return migration
Hi John
>
> > However, there is a potential issue. Whether it is the original code
> > logic or the v2.0 patch mentioned above, directly removing a task from
> > the runqueue in proxy_needs_return undermines the fairness enforced by
> > EEVDF. In fact, clearing the task's `p->se.sched_delayed` variable
> > first (via enqueue_task(ENQUEUE_DELAYED) -> enqueue_task_fair ->
> > requeue_delayed_entity) and then removing task from the runqueue in
> > proxy_needs_return is essentially no different from directly removing
> > the task while retaining its lag. This is because the non-zero lag
> > will eventually be discarded. Therefore, regardless of the method
> > used, the fairness maintained by EEVDF will be compromised.
>
> Hrm. Can you walk me through the specific case you're thinking about here?
>
> Is the idea something like: a mutex blocked task (not sched_delayed)
> gets migrated to a rq, where it acts as a donor so that a lock holder
> can be run.
> If the lock holder sleeps, it might be set as sched_delayed, but the
> donor will be dequeued from the rq and enqueued onto the sched_delayed
> sleeping owner.
>
> And the concern is that in doing this, the donor's lag from the rq it
> was migrated to won't be preserved (since it isn't set as
> sched_delayed)?
>
> I'll need to think on this a bit, as I don't quite have my head around
> how mutex blocked tasks might also end up sched_delayed.
>
There may have been an issue with my earlier analysis, and I apologize
for my oversight.
During previous debugging, I noticed that when handling the
sched_delayed logic in the ttwu_runnable function, there were
instances where the donor's vlag was reset to zero. This raised
concerns for me, as I worried that directly resetting the vlag might
compromise the fairness ensured by EEVDF.
However, upon closer inspection of the code, I realized that this is
actually a normal adjustment of the vlag. The purpose of this process
is to check whether the sched_delayed task has repaid its past "debt"
and determine whether its position in the red-black tree needs to be
updated. The corresponding call stack is as follows:
ttwu_runnable
|-- if (p->se.sched_delayed) enqueue_task(flags=ENQUEUE_DELAYED)
|-- enqueue_task_fair(flags=ENQUEUE_DELAYED)
|-- requeue_delayed_entity(se)
|-- update_entity_lag()
|-- se->vlag = entity_lag(avg_vruntime(cfs_rq), se);
Therefore, the logic in ttwu_runnable does not compromise the fairness of EEVDF.
However, the above call stack gives me another idea: Executing the
sched_delayed logic seems redundant if a donor task needs to return
back. This is because the two main works in the sched_delayed logic —
"updating the vlag" and "adjusting the task's position in the
red-black tree" — can be deferred to later stages, as shown in the
call stack below:
a) STEP#1: Update the donor's vlag.
b) STEP#2: Update the donor's vruntime and deadline.
c) STEP#3: Update the donor's position in the red-black tree.
try_to_wake_up
|-- ttwu_runnable
| |-- proxy_needs_return
| |-- deactivate_task
| |-- dequeue_task
| |-- dequeue_task_fair
| |-- dequeue_entities
| |-- dequeue_entity
| |-- update_entity_lag (STEP #1)
| |-- se->vlag = entity_lag()
|
|-- new_cpu = select_task_rq //Re-select a new CPU for the donor task
|-- ttwu_queue(p, new_cpu)
|-- ttwu_do_activate
|-- activate_task(flags=ENQUEUE_WAKEUP)
|-- __activate_task(flags=ENQUEUE_WAKEUP)
|-- enqueue_task(flags=ENQUEUE_WAKEUP)
|-- enqueue_task_fair(flags=ENQUEUE_WAKEUP)
|-- enqueue_entity(flags=ENQUEUE_WAKEUP)
|-- place_entity (STEP #2)
|-- __enqueue_entity (STEP #3)
In other words, if the donor task needs to be migrated back, the
sched_delayed logic can be skipped entirely. Therefore, the final
PATCH should be as follows.
In this PATCH, proxy_needs_return is moved ahead of the sched_delayed
logic, and the p->se.sched_delayed variable is forcefully set to 0.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
old mode 100644
new mode 100755
index 06e9924d3f77..f585e28b19eb
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4160,6 +4160,18 @@ static int ttwu_runnable(struct task_struct *p,
int wake_flags)
rq = __task_rq_lock(p, &rf);
if (task_on_rq_queued(p)) {
update_rq_clock(rq);
+ if (proxy_needs_return(rq, p)) {
+ /*
+ * Force clear, as the work of updating the donor's vlag
+ * and its position in the red-black tree can
be deferred
+ * to later stages.
+ */
+ p->se.sched_delayed = 0;
+
+ _trace_sched_pe_return_migration(p);
+ goto out;
+ }
+
if (p->se.sched_delayed) {
proxy_remove_from_sleeping_owner(p);
enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
@@ -4171,10 +4183,6 @@ static int ttwu_runnable(struct task_struct *p,
int wake_flags)
*/
wakeup_preempt(rq, p, wake_flags);
}
- if (proxy_needs_return(rq, p)) {
- _trace_sched_pe_return_migration(p);
- goto out;
- }
ttwu_do_wakeup(p);
ret = 1;
}
This PATCH can boot up successfully. Furthermore, I triggered the
mutex-related test cases by executing the following command, and it
has been running for approximately 20 hours without any issues.
# while true; do echo 1 > /sys/kernel/test_ww_mutex/run_tests; sleep 10; done
Looking forward to your reply. Thanks.
hupu
Powered by blists - more mailing lists