linux-kernel - Re: [PATCH] sched/eevdf: Update se->vprot in reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cda196fb-9e4a-4e09-8021-12b67eb9cbcb@amd.com>
Date: Thu, 22 Jan 2026 10:08:42 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>, "wangtao (EQ)"
	<wangtao554@...wei.com>
CC: <mingo@...hat.com>, <peterz@...radead.org>, <juri.lelli@...hat.com>,
	<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <tanghui20@...wei.com>,
	<zhangqiao22@...wei.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/eevdf: Update se->vprot in reweight_entity()

Hello Vincent,

On 1/21/2026 10:30 PM, Vincent Guittot wrote:
>>> If protect_slice() was true to begin with, we should do:
>>>
>>>     if (protect_slice)
>>>         se->vprot = min_vruntime(se->vprot, se->deadline);
>>>
>>> This ensures that if our deadline has moved back, we only protect until
>>> the new deadline and the scheduler can re-evaluate after that. If there
>>> was an entity with a shorter slice at the beginning of the pick, the
>>> "vprot" should still reflect the old value that was calculated using
>>> "se->vruntime" at the time of the pick.
>>>
>>
>> Regarding your suggestion, I have a concern.
>> When a task's weight changes, I believe its "vprot" should also change
>> accordingly. If we keep the original "vprot" unchanged, the task's
>> actual runtime will not reach the expected "vprot".
>>
>> For example, if the weight decreases, the "vruntime" increases faster.
>> If "vprot" is not updated, the task will hit the limit much earlier than
>> expected in physical time.
> 
> Ok but it's the safest way to stay in the min runnable slice limit so
> you need to demonstrate that its lag will remain below.
> Run to parity already goes close to this limit by not looking at a new
> running entity at every tick so we need to be sure to not break this
> rule.

Wakeups will do a update_protect_slice() at the common ancestors for
RUN_TO_PARITY but this is also a problem for cgroups where the current
sched_entities can undergo a reweight at every tick.

For Wang's specific case, the problem is "se->vprot" can expand if
entity's weight goes down but update_protect_slice() does a

  min(se->vprot, se->vruntime + scale_delta_fair(min_slice, se))

which is capped by the original vprot. The reweight can end up with a
case where:

       se->vprot    <   se->deadline   <  se->vruntime + scale_delta_fair(min_slice, se)
  (at time of pick)   (after reweight)

Ideally, we should redo set_protect_slice() with the same situation
at the time of pick with adjusted vruntime, deadline, and considering
the current set of tasks queued but it boils down to the same way we
do "rel_deadline" today right?

wakeup_preempt_fair() would have already updated the vprot based on
the min_slice and since we do a scale_delta_fair(min_slice, se) in
{set,update}_protect_slice(), it scaling it with the current entity's
weight shouldn't be a problem I suppose.

> 
>>
>> Therefore, I made this modification to ensure the protection slice is
>> valid. I would like to hear your thoughts on this.
>>
>>         if (protect_slice) {
>>             se->vprot -= se->vruntime;
>>             se->vprot = div_s64(se->vprot * se->load.weight, weight);
>>             se->vprot += se->vruntime;
>>
>>             se->vprot = min_vruntime(se->vprot, se->deadline);
> 
> Why not use update_protect_slice() like when a new task with a shorter
> slice is added ?

Are you suggesting something like the following to keep
protect_slice() updated at enqueue?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dc905b853c4b..c21fb039038b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6937,15 +6937,20 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	}
 
 	for_each_sched_entity(se) {
+		struct sched_entity *curr;
+
 		cfs_rq = cfs_rq_of(se);
+		curr = cfs_rq->curr;
 
 		update_load_avg(cfs_rq, se, UPDATE_TG);
 		se_update_runnable(se);
 		update_cfs_group(se);
 
 		se->slice = slice;
-		if (se != cfs_rq->curr)
+		if (se != curr)
 			min_vruntime_cb_propagate(&se->run_node, NULL);
+		if (curr && curr->on_rq)
+			update_protect_slice(cfs_rq, curr);
 		slice = cfs_rq_min_slice(cfs_rq);
 
 		cfs_rq->h_nr_runnable += h_nr_runnable;
-- 
Thanks and Regards,
Prateek