linux-kernel - Re: [PATCH] sched: fair: make V move forward only

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251202151732.GD2458571@noisy.programming.kicks-ass.net>
Date: Tue, 2 Dec 2025 16:17:32 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: wangtao <tao.wangtao@...or.com>
Cc: "mingo@...hat.com" <mingo@...hat.com>,
	"juri.lelli@...hat.com" <juri.lelli@...hat.com>,
	"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
	"dietmar.eggemann@....com" <dietmar.eggemann@....com>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	"bsegall@...gle.com" <bsegall@...gle.com>,
	"mgorman@...e.de" <mgorman@...e.de>,
	"vschneid@...hat.com" <vschneid@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	liulu 00013167 <liulu.liu@...or.com>,
	"wangbintian(BintianWang)" <bintian.wang@...or.com>,
	wangzicheng <wangzicheng@...or.com>
Subject: Re: [PATCH] sched: fair: make V move forward only

On Tue, Dec 02, 2025 at 12:43:15PM +0000, wangtao wrote:
> 
> > -----Original Message-----
> > From: Peter Zijlstra <peterz@...radead.org>
> > Sent: Friday, November 28, 2025 5:29 PM
> > To: wangtao <tao.wangtao@...or.com>
> > Cc: mingo@...hat.com; juri.lelli@...hat.com; vincent.guittot@...aro.org;
> > dietmar.eggemann@....com; rostedt@...dmis.org; bsegall@...gle.com;
> > mgorman@...e.de; vschneid@...hat.com; linux-kernel@...r.kernel.org;
> > liulu.liu@...or.com; bintian.wang@...or.com
> > Subject: Re: [PATCH] sched: fair: make V move forward only
> > 
> > On Fri, Nov 28, 2025 at 04:11:18PM +0800, wangtao wrote:
> > > V is the weighted average of entities. Adding tasks with positive lag
> > > or removing tasks with negative lag may cause V to move backward. This
> > > will result in unfair task scheduling,
> > 
> > Have you actually read the paper? Why do you think this breaks fairness?
> > 
> > > causing previously eligible tasks to become ineligible, shorter
> > > runtimes, and more task switches.
> > 
> > None of that is a fairness issue. Those are issues related to when, rather than
> > how much time is given.
> > 
> Maybe my earlier explanation was unclear, so here is a detailed example.

Not unclear, still not sure how its a fairness issue. Yes it increases
context switches, but fairness is about the amount of time distributed,
not about when time is given.

It is entirely reasonable for a task that wakes up to run now if it has
positive lag.

> > > Making V move forward only resolves such issues and simplifies the
> > > code for adding tasks with positive lag.
> > 
> > It breaks a metric ton of math. Which you don't provide updates for.
> > 
> > Yes, the paper is light on dynamic behaviour, but please don't disregard the
> > math like this. Either stay inside the constraints laid out, or provide coherent
> > alternatives. Notably EEVDF is in the same class of scheduling functions as
> > WF2Q and both provide better lag bounds than the simpler WFQ class of
> > schedulers.
> > 
> > The 'zero-lag point is the weighted average of the entities' is a fairly core
> > tenet of EEVDF. Mucking with this *will* mess with the lag bounds.
> > 
> You are right. The reasoning should be based on math. I will submit a new
> patch based on the derivation below.
> 
> When deriving V, we previously assumed \Sum lag_i = 0, but in many cases
> \Sum lag_i is not zero.

Sure, which is where the V adjustments come from.

> The simplest case is when the run queue is empty and we add a task with
> lag not zero, Then \Sum lag_i is clearly not zero.

Well, all of this is only valid under contention. If there is only a
single task, there is no lag. Note how we ignore lag when placing the
first entry.

Still the point remains, you can leave/join with non-zero lag, and in
those cases we have to adjust V.

> Our goal is \Sum lag_i = 0. 

This is equivalent to the statement that the zero-lag point is the
weighted average of vruntime. These two things are interchangeable.

> So when task i finishes running and is removed
> from the queue, its lag is:
> 
>   lag_i = S - s_i = w_i * (V - v_i)

Yes, but taking it out also affects V.

> After some time, when task i is added back,

Note how you state we're adding _i_ back.

> the other tasks' v have
> changed, and V has changed. Let A be V at that moment:
> 
>   \Sum lag_i = \Sum w_i * (A - v_i)

You need to introduce more indices, i was our removed task, but now
you're re-using i to sum over the remaining tasks?

And if indeed you did mean:

	\Sun_j!=i lag_j = \Sum_j!=i w_j * (A - v_j)

                                   \Sum v_j * w_j
Then yes, because A, or rather V = --------------
                                      \Sum w_j

Making both left- and right-hand expressions 0.

> We can compute A from v_i and lag_i:
> 
>   A = \Sum (w_i * v_i + lag_i) / \Sum w_i = \Sum (w_i * v_i + lag_i) / W

You're doing:

lag_j = w_j * (A - v_j)

lag_j 
----- = A - v_j
w_j

lag_j
----- + v_j = A
w_j

lag_j   v_j * w_j
----- + --------- = A
w_j     w_j

lag_j + v_j * w_j
----------------- = A
w_j

Which is creating a new clock that absorbs a non-zero lag sum?

I don't think there's a bound on the difference between A and V. You can
extract unbounded lag from the system. We recently had someone showcase
exactly that, they managed to wrap V backwards far enough to make the
old min_vruntime thing wrap the s64 space and things went sideways real
fast.

> Use A to compute v for the newly added task k.

But you were adding task i, per [*]

> After adding k:
> 
>   A' = (\Sum (w_i * v_i + lag_i) + w_k * v_k + lag_k) / (W + w_k)
> 
> To preserve lag_k, set v_k = A' - lag_k / w_k, then:
> 
>   A' = (\Sum (w_i * v_i + lag_i) + w_k * A') / (W + w_k)
>   A' * (W + w_k) = \Sum (w_i * v_i + lag_i) + w_k * A'
>   A' = \Sum (w_i * v_i + lag_i) / W = A
> 
> This shows that adding task k does not change A.
> Similarly, removing task k does not change A.
> 
> It is easy to get:
> 
>   A = V + (\Sum lag_i) / W
> 
> \Sum lag_i stays around 0 when tasks are added or removed.
> 
> The roles of V and A are:
>   1) Use V to judge whether task is eligible.
>   2) Use V to compute lag when a task is removed.
>   3) Use A to compute v when a task is added.

So I don't think A can work. As mentioned, they can drift unbounded, and
per that they would also break the lag bounds as outlined in lemmas 3
and onwards.

Now, there is a clue in the WF2Q paper, that it is possible to change
the V function. This is mentioned in the future work section of the
paper. I've never gotten around to figuring out which paper that is and
getting a copy.