lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <25b32c05-7155-4290-8e16-64548fd27ce6@huawei.com>
Date: Thu, 22 Jan 2026 15:52:22 +0800
From: "wangtao (EQ)" <wangtao554@...wei.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
CC: K Prateek Nayak <kprateek.nayak@....com>, <mingo@...hat.com>,
	<peterz@...radead.org>, <juri.lelli@...hat.com>, <dietmar.eggemann@....com>,
	<rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
	<vschneid@...hat.com>, <tanghui20@...wei.com>, <zhangqiao22@...wei.com>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/eevdf: Update se->vprot in reweight_entity()

Hi Vincent,
Thanks for the feedback.

Vincent Guittot 写道:
> On Wed, 21 Jan 2026 at 10:33, wangtao (EQ) <wangtao554@...wei.com> wrote:
>>
>> Hello Prateek,
>>
>> K Prateek Nayak 写道:
>>> Hello Wang,
>>>
>>> On 1/20/2026 6:01 PM, Wang Tao wrote:
>>>> In the EEVDF framework with Run-to-Parity protection, `se->vprot` is an
>>>> independent variable defining the virtual protection timestamp.
>>>>
>>>> When `reweight_entity()` is called (e.g., via nice/renice), it performs
>>>> the following actions to preserve Lag consistency:
>>>> 1. Scales `se->vlag` based on the new weight.
>>>> 2. Calls `place_entity()`, which recalculates `se->vruntime` based on
>>>>    the new weight and scaled lag.
>>>>
>>>> However, the current implementation fails to update `se->vprot`, leading
>>>> to mismatches between the task's actual runtime and its expected duration.
>>>
>>> I don't think that is correct. "vprot" allows for "min_slice" worth of
>>> runtime from the beginning of the pick however, if we do a
>>> set_protect_slice() after reweight, we'll essentially grant another
>>> "min_slice" worth of time from the current "se->vruntime" (or until
>>> deadline if it is sooner) which is not correct.
>>>
>>>>
>>>> This patch fixes the issue by calling `set_protect_slice()` at the end of
>>>> `reweight_entity()`. This ensures that a new, valid protection slice is
>>>> committed based on the updated `vruntime` and the new weight, restoring
>>>> Run-to-Parity consistency immediately after a weight change.
>>>>
>>>> Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption")
>>>> Suggested-by: Zhang Qiao <zhangqiao22@...wei.com>
>>>> Signed-off-by: Wang Tao <wangtao554@...wei.com>
>>>> ---
>>>>  kernel/sched/fair.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index e71302282671..bdd8c4e688ae 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -3792,6 +3792,8 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
>>>
>>> At the beginning of reweight_entity, we should first check if
>>> protect_slice() for curr is true or not:
>>>
>>>     bool protect_slice = curr && protect_slice(se);
>>>
>>>>              if (!curr)
>>>>                      __enqueue_entity(cfs_rq, se);
>>>>              cfs_rq->nr_queued++;
>>>> +            if (curr)
>>>> +                    set_protect_slice(cfs_rq, se);
>>>
>>>
>>> If protect_slice() was true to begin with, we should do:
>>>
>>>     if (protect_slice)
>>>         se->vprot = min_vruntime(se->vprot, se->deadline);
>>>
>>> This ensures that if our deadline has moved back, we only protect until
>>> the new deadline and the scheduler can re-evaluate after that. If there
>>> was an entity with a shorter slice at the beginning of the pick, the
>>> "vprot" should still reflect the old value that was calculated using
>>> "se->vruntime" at the time of the pick.
>>>
>>
>> Regarding your suggestion, I have a concern.
>> When a task's weight changes, I believe its "vprot" should also change
>> accordingly. If we keep the original "vprot" unchanged, the task's
>> actual runtime will not reach the expected "vprot".
>>
>> For example, if the weight decreases, the "vruntime" increases faster.
>> If "vprot" is not updated, the task will hit the limit much earlier than
>> expected in physical time.
> 
> Ok but it's the safest way to stay in the min runnable slice limit so
> you need to demonstrate that its lag will remain below.
> Run to parity already goes close to this limit by not looking at a new
> running entity at every tick so we need to be sure to not break this
> rule.
> 

I fully agree that staying within the Lag limit is paramount and we must
ensure the scheduler invariant is preserved.

>>
>> Therefore, I made this modification to ensure the protection slice is
>> valid. I would like to hear your thoughts on this.
>>
>>         if (protect_slice) {
>>             se->vprot -= se->vruntime;
>>             se->vprot = div_s64(se->vprot * se->load.weight, weight);
>>             se->vprot += se->vruntime;
>>
>>             se->vprot = min_vruntime(se->vprot, se->deadline);
> 
> Why not use update_protect_slice() like when a new task with a shorter
> slice is added ?

I looked into reusing update_protect_slice(), but there is a
mathematical hurdle specific to reweight_entity(): the coordinate space
distortion.

In reweight_entity(), se->vruntime is scaled to match the new weight,
but se->vprot remains in the old weight's coordinate system. If we
simply call update_protect_slice() or compare them directly, we are
comparing values with different virtual time speeds, which leads to
incorrect truncation.

> 
>>         }
>>
>> Best Regards,
>> Tao
> 

To address your safety concerns while fixing the coordinate mismatch, I
have implemented some changes:

Capture: Calculate the remaining physical slice before se->vruntime is
updated (while both are still in the old coordinate system).

Scale: Project this physical slice to the new weight's virtual domain.

Clamp (Safety): Apply the new slice to the new vruntime, but strictly
clamp it with min_vruntime(..., se->deadline).

This approach preserves the physical time promise (fixing the bug) but
includes the min_vruntime guardrail you suggested to ensure we never
exceed the deadline/lag limit.

Does this align with what you had in mind?

Best,
Tao

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bdd8c4e688ae..52dc7deea7a1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3755,10 +3755,17 @@ static void reweight_entity(struct cfs_rq
*cfs_rq, struct sched_entity *se,
                            unsigned long weight)
 {
        bool curr = cfs_rq->curr == se;
+       bool protect_slice = curr && protect_slice(se);
+       u64 vslice_rem = 0;

        if (se->on_rq) {
+
+               if (protect_slice) {
+                       vslice_rem = se->vprot - se->vruntime;
+               }
+
                update_entity_lag(cfs_rq, se);
                se->deadline -= se->vruntime;
                se->rel_deadline = 1;
@@ -3792,8 +3799,13 @@ static void reweight_entity(struct cfs_rq
*cfs_rq, struct sched_entity *se,
                if (!curr)
                        __enqueue_entity(cfs_rq, se);
                cfs_rq->nr_queued++;
-               if (curr)
-                       set_protect_slice(cfs_rq, se);
+
+               if (protect_slice && vslice_rem > 0) {
+                       vslice_rem = div_s64(vslice_rem *
se->load.weight, weight);
+                       se->vprot == se->vruntime + vslice_rem;
+
+                       se->vprot = min_vruntime(se->vprot, se->deadline);
+               }
        }
 }


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ