linux-kernel - Re: [PATCH] sched/eevdf: Update se->vprot in reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtA6dUWn1DtQbzCvk19RygO9qzZbxz86yWAC3LGKcSL3cA@mail.gmail.com>
Date: Fri, 23 Jan 2026 17:38:15 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: "wangtao (EQ)" <wangtao554@...wei.com>, mingo@...hat.com, peterz@...radead.org, 
	juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	tanghui20@...wei.com, zhangqiao22@...wei.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/eevdf: Update se->vprot in reweight_entity()

On Thu, 22 Jan 2026 at 05:38, K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> Hello Vincent,
>
> On 1/21/2026 10:30 PM, Vincent Guittot wrote:
> >>> If protect_slice() was true to begin with, we should do:
> >>>
> >>>     if (protect_slice)
> >>>         se->vprot = min_vruntime(se->vprot, se->deadline);
> >>>
> >>> This ensures that if our deadline has moved back, we only protect until
> >>> the new deadline and the scheduler can re-evaluate after that. If there
> >>> was an entity with a shorter slice at the beginning of the pick, the
> >>> "vprot" should still reflect the old value that was calculated using
> >>> "se->vruntime" at the time of the pick.
> >>>
> >>
> >> Regarding your suggestion, I have a concern.
> >> When a task's weight changes, I believe its "vprot" should also change
> >> accordingly. If we keep the original "vprot" unchanged, the task's
> >> actual runtime will not reach the expected "vprot".
> >>
> >> For example, if the weight decreases, the "vruntime" increases faster.
> >> If "vprot" is not updated, the task will hit the limit much earlier than
> >> expected in physical time.
> >
> > Ok but it's the safest way to stay in the min runnable slice limit so
> > you need to demonstrate that its lag will remain below.
> > Run to parity already goes close to this limit by not looking at a new
> > running entity at every tick so we need to be sure to not break this
> > rule.
>
> Wakeups will do a update_protect_slice() at the common ancestors for
> RUN_TO_PARITY but this is also a problem for cgroups where the current
> sched_entities can undergo a reweight at every tick.
>
> For Wang's specific case, the problem is "se->vprot" can expand if
> entity's weight goes down but update_protect_slice() does a
>
>   min(se->vprot, se->vruntime + scale_delta_fair(min_slice, se))
>
> which is capped by the original vprot. The reweight can end up with a
> case where:
>
>        se->vprot    <   se->deadline   <  se->vruntime + scale_delta_fair(min_slice, se)
>   (at time of pick)   (after reweight)
>
> Ideally, we should redo set_protect_slice() with the same situation
> at the time of pick with adjusted vruntime, deadline, and considering
> the current set of tasks queued but it boils down to the same way we
> do "rel_deadline" today right?
>
> wakeup_preempt_fair() would have already updated the vprot based on
> the min_slice and since we do a scale_delta_fair(min_slice, se) in
> {set,update}_protect_slice(), it scaling it with the current entity's
> weight shouldn't be a problem I suppose.
>
> >
> >>
> >> Therefore, I made this modification to ensure the protection slice is
> >> valid. I would like to hear your thoughts on this.
> >>
> >>         if (protect_slice) {
> >>             se->vprot -= se->vruntime;
> >>             se->vprot = div_s64(se->vprot * se->load.weight, weight);
> >>             se->vprot += se->vruntime;
> >>
> >>             se->vprot = min_vruntime(se->vprot, se->deadline);
> >
> > Why not use update_protect_slice() like when a new task with a shorter
> > slice is added ?
>
> Are you suggesting something like the following to keep
> protect_slice() updated at enqueue?

No, I was suggesting to never increase vprot in order to be sure that
we will not break the lag rule. but wangtao and peter made another
proposal

>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index dc905b853c4b..c21fb039038b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6937,15 +6937,20 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>         }
>
>         for_each_sched_entity(se) {
> +               struct sched_entity *curr;
> +
>                 cfs_rq = cfs_rq_of(se);
> +               curr = cfs_rq->curr;
>
>                 update_load_avg(cfs_rq, se, UPDATE_TG);
>                 se_update_runnable(se);
>                 update_cfs_group(se);
>
>                 se->slice = slice;
> -               if (se != cfs_rq->curr)
> +               if (se != curr)
>                         min_vruntime_cb_propagate(&se->run_node, NULL);
> +               if (curr && curr->on_rq)
> +                       update_protect_slice(cfs_rq, curr);
>                 slice = cfs_rq_min_slice(cfs_rq);
>
>                 cfs_rq->h_nr_runnable += h_nr_runnable;
> --
> Thanks and Regards,
> Prateek
>