[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84787b6343fd4ac595a185645ced1de5@honor.com>
Date: Tue, 2 Dec 2025 12:43:15 +0000
From: wangtao <tao.wangtao@...or.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: "mingo@...hat.com" <mingo@...hat.com>, "juri.lelli@...hat.com"
<juri.lelli@...hat.com>, "vincent.guittot@...aro.org"
<vincent.guittot@...aro.org>, "dietmar.eggemann@....com"
<dietmar.eggemann@....com>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
"bsegall@...gle.com" <bsegall@...gle.com>, "mgorman@...e.de"
<mgorman@...e.de>, "vschneid@...hat.com" <vschneid@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, liulu 00013167
<liulu.liu@...or.com>, "wangbintian(BintianWang)" <bintian.wang@...or.com>,
wangzicheng <wangzicheng@...or.com>
Subject: RE: [PATCH] sched: fair: make V move forward only
> -----Original Message-----
> From: Peter Zijlstra <peterz@...radead.org>
> Sent: Friday, November 28, 2025 5:29 PM
> To: wangtao <tao.wangtao@...or.com>
> Cc: mingo@...hat.com; juri.lelli@...hat.com; vincent.guittot@...aro.org;
> dietmar.eggemann@....com; rostedt@...dmis.org; bsegall@...gle.com;
> mgorman@...e.de; vschneid@...hat.com; linux-kernel@...r.kernel.org;
> liulu.liu@...or.com; bintian.wang@...or.com
> Subject: Re: [PATCH] sched: fair: make V move forward only
>
> On Fri, Nov 28, 2025 at 04:11:18PM +0800, wangtao wrote:
> > V is the weighted average of entities. Adding tasks with positive lag
> > or removing tasks with negative lag may cause V to move backward. This
> > will result in unfair task scheduling,
>
> Have you actually read the paper? Why do you think this breaks fairness?
>
> > causing previously eligible tasks to become ineligible, shorter
> > runtimes, and more task switches.
>
> None of that is a fairness issue. Those are issues related to when, rather than
> how much time is given.
>
Maybe my earlier explanation was unclear, so here is a detailed example.
The run queue is empty. V is V_0. We add three tasks A, X, B in order.
lag_A = 0, lag_X > 0, lag_B = 0.
To simplify, assume the time gaps of the three additions are so small that
we can treat them as zero.
Add task A: v_A = V_0. After adding A, V does not change, V_A = V_0.
Add task X: v_X = V_A = V_0. After adding X, V_B < V_0.
Add task B: v_B = V_X < V_0 = v_A. After adding B, V_B = V_X < V_0.
X runs first. After X runs, because v_B < v_A, we also get
deadline_B < deadline_A. So the later-added task B runs before A, which
is not fair.
But in reality the three additions have time gaps. Suppose A runs for a
short time before adding X. Let V_A' be V at that moment, and let v_A' be
A's vruntime. We have v_A' = V_A' > V_0.
v_X = V_A' - lag_X < V_A'. After putting X into the queue, V becomes
smaller: V_X < V_A'. Since v_A' > V_X, A becomes uneligible. Even if A is
in protect_slice, it will switch to X immediately, increasing context
switches.
Here is a trace where several tasks sleep for a while and then run briefly:
v_30981 = 16015784959115 = V, schedule 30981
<idle>-0 (...) place_entity ... pid 30981 vruntime 16015784959115 vlag 0 lag 0
sched_switch ... next_pid=30981 ...
After adding 30983, since vlag_30983 > 0:
v_30983 = 16015785028217 < V_30981' = v_30981' = 16015785080365
place_entity ... pid 30983 vruntime 16015785028217 vlag 26074 lag 52148
After adding 30982, even though vlag_30982 < 0, because vlag_30983 is
large: v_30982 = 16015785061209 < V_30981' = 16015785080365
place_entity ... pid 30982 vruntime 16015785061209 vlag -670 lag -3793
Switch to 30983:
sched_switch ... next_pid=30983 ...
Then task 30982 with negative vlag runs before 30981:
sched_switch ... next_pid=30982 ...
sched_switch ... next_pid=30981 ...
sched_switch ... next_pid=0 ...
> > Making V move forward only resolves such issues and simplifies the
> > code for adding tasks with positive lag.
>
> It breaks a metric ton of math. Which you don't provide updates for.
>
> Yes, the paper is light on dynamic behaviour, but please don't disregard the
> math like this. Either stay inside the constraints laid out, or provide coherent
> alternatives. Notably EEVDF is in the same class of scheduling functions as
> WF2Q and both provide better lag bounds than the simpler WFQ class of
> schedulers.
>
> The 'zero-lag point is the weighted average of the entities' is a fairly core
> tenet of EEVDF. Mucking with this *will* mess with the lag bounds.
>
You are right. The reasoning should be based on math. I will submit a new
patch based on the derivation below.
When deriving V, we previously assumed \Sum lag_i = 0, but in many cases
\Sum lag_i is not zero.
The simplest case is when the run queue is empty and we add a task with
lag not zero, Then \Sum lag_i is clearly not zero.
Our goal is \Sum lag_i = 0. So when task i finishes running and is removed
from the queue, its lag is:
lag_i = S - s_i = w_i * (V - v_i)
After some time, when task i is added back, the other tasks' v have
changed, and V has changed. Let A be V at that moment:
\Sum lag_i = \Sum w_i * (A - v_i)
We can compute A from v_i and lag_i:
A = \Sum (w_i * v_i + lag_i) / \Sum w_i = \Sum (w_i * v_i + lag_i) / W
Use A to compute v for the newly added task k. After adding k:
A' = (\Sum (w_i * v_i + lag_i) + w_k * v_k + lag_k) / (W + w_k)
To preserve lag_k, set v_k = A' - lag_k / w_k, then:
A' = (\Sum (w_i * v_i + lag_i) + w_k * A') / (W + w_k)
A' * (W + w_k) = \Sum (w_i * v_i + lag_i) + w_k * A'
A' = \Sum (w_i * v_i + lag_i) / W = A
This shows that adding task k does not change A.
Similarly, removing task k does not change A.
It is easy to get:
A = V + (\Sum lag_i) / W
\Sum lag_i stays around 0 when tasks are added or removed.
The roles of V and A are:
1) Use V to judge whether task is eligible.
2) Use V to compute lag when a task is removed.
3) Use A to compute v when a task is added.
> The delayed dequeue feature tries to address some of these concerns by
> keeping non-eligible (negative lag) tasks on the runqueue until such time
> that they become eligible (approximated by getting picked again) at which
> point they get removed (and any positive lag gets truncated, as if they were
> removed at zero-lag). As a consequence you will have much less removal of
> negative lag, additionally such tasks will be eligible the moment they come
> back.
>
> Also, there is the small matter that your patch simply does not apply.
Thanks,
Tao
Powered by blists - more mailing lists