linux-kernel - Re: [PATCH] sched/eevdf: Update se->vprot in reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBiDXwxCLEZNbn48dANdt8Od4ZY-idByEJ4kfEQ0eg0mw@mail.gmail.com>
Date: Fri, 23 Jan 2026 18:32:57 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: "wangtao (EQ)" <wangtao554@...wei.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>, mingo@...hat.com, peterz@...radead.org, 
	juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	tanghui20@...wei.com, zhangqiao22@...wei.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/eevdf: Update se->vprot in reweight_entity()

On Thu, 22 Jan 2026 at 08:52, wangtao (EQ) <wangtao554@...wei.com> wrote:
>
> Hi Vincent,
> Thanks for the feedback.
>
> Vincent Guittot 写道:
> > On Wed, 21 Jan 2026 at 10:33, wangtao (EQ) <wangtao554@...wei.com> wrote:
> >>
> >> Hello Prateek,
> >>
> >> K Prateek Nayak 写道:
> >>> Hello Wang,
> >>>
> >>> On 1/20/2026 6:01 PM, Wang Tao wrote:
> >>>> In the EEVDF framework with Run-to-Parity protection, `se->vprot` is an
> >>>> independent variable defining the virtual protection timestamp.
> >>>>
> >>>> When `reweight_entity()` is called (e.g., via nice/renice), it performs
> >>>> the following actions to preserve Lag consistency:
> >>>> 1. Scales `se->vlag` based on the new weight.
> >>>> 2. Calls `place_entity()`, which recalculates `se->vruntime` based on
> >>>>    the new weight and scaled lag.
> >>>>
> >>>> However, the current implementation fails to update `se->vprot`, leading
> >>>> to mismatches between the task's actual runtime and its expected duration.
> >>>
> >>> I don't think that is correct. "vprot" allows for "min_slice" worth of
> >>> runtime from the beginning of the pick however, if we do a
> >>> set_protect_slice() after reweight, we'll essentially grant another
> >>> "min_slice" worth of time from the current "se->vruntime" (or until
> >>> deadline if it is sooner) which is not correct.
> >>>
> >>>>
> >>>> This patch fixes the issue by calling `set_protect_slice()` at the end of
> >>>> `reweight_entity()`. This ensures that a new, valid protection slice is
> >>>> committed based on the updated `vruntime` and the new weight, restoring
> >>>> Run-to-Parity consistency immediately after a weight change.
> >>>>
> >>>> Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption")
> >>>> Suggested-by: Zhang Qiao <zhangqiao22@...wei.com>
> >>>> Signed-off-by: Wang Tao <wangtao554@...wei.com>
> >>>> ---
> >>>>  kernel/sched/fair.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>>> index e71302282671..bdd8c4e688ae 100644
> >>>> --- a/kernel/sched/fair.c
> >>>> +++ b/kernel/sched/fair.c
> >>>> @@ -3792,6 +3792,8 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
> >>>
> >>> At the beginning of reweight_entity, we should first check if
> >>> protect_slice() for curr is true or not:
> >>>
> >>>     bool protect_slice = curr && protect_slice(se);
> >>>
> >>>>              if (!curr)
> >>>>                      __enqueue_entity(cfs_rq, se);
> >>>>              cfs_rq->nr_queued++;
> >>>> +            if (curr)
> >>>> +                    set_protect_slice(cfs_rq, se);
> >>>
> >>>
> >>> If protect_slice() was true to begin with, we should do:
> >>>
> >>>     if (protect_slice)
> >>>         se->vprot = min_vruntime(se->vprot, se->deadline);
> >>>
> >>> This ensures that if our deadline has moved back, we only protect until
> >>> the new deadline and the scheduler can re-evaluate after that. If there
> >>> was an entity with a shorter slice at the beginning of the pick, the
> >>> "vprot" should still reflect the old value that was calculated using
> >>> "se->vruntime" at the time of the pick.
> >>>
> >>
> >> Regarding your suggestion, I have a concern.
> >> When a task's weight changes, I believe its "vprot" should also change
> >> accordingly. If we keep the original "vprot" unchanged, the task's
> >> actual runtime will not reach the expected "vprot".
> >>
> >> For example, if the weight decreases, the "vruntime" increases faster.
> >> If "vprot" is not updated, the task will hit the limit much earlier than
> >> expected in physical time.
> >
> > Ok but it's the safest way to stay in the min runnable slice limit so
> > you need to demonstrate that its lag will remain below.
> > Run to parity already goes close to this limit by not looking at a new
> > running entity at every tick so we need to be sure to not break this
> > rule.
> >
>
> I fully agree that staying within the Lag limit is paramount and we must
> ensure the scheduler invariant is preserved.
>
> >>
> >> Therefore, I made this modification to ensure the protection slice is
> >> valid. I would like to hear your thoughts on this.
> >>
> >>         if (protect_slice) {
> >>             se->vprot -= se->vruntime;
> >>             se->vprot = div_s64(se->vprot * se->load.weight, weight);
> >>             se->vprot += se->vruntime;
> >>
> >>             se->vprot = min_vruntime(se->vprot, se->deadline);
> >
> > Why not use update_protect_slice() like when a new task with a shorter
> > slice is added ?
>
> I looked into reusing update_protect_slice(), but there is a
> mathematical hurdle specific to reweight_entity(): the coordinate space
> distortion.
>
> In reweight_entity(), se->vruntime is scaled to match the new weight,
> but se->vprot remains in the old weight's coordinate system. If we
> simply call update_protect_slice() or compare them directly, we are
> comparing values with different virtual time speeds, which leads to
> incorrect truncation.
>
> >
> >>         }
> >>
> >> Best Regards,
> >> Tao
> >
>
> To address your safety concerns while fixing the coordinate mismatch, I
> have implemented some changes:
>
> Capture: Calculate the remaining physical slice before se->vruntime is
> updated (while both are still in the old coordinate system).
>
> Scale: Project this physical slice to the new weight's virtual domain.
>
> Clamp (Safety): Apply the new slice to the new vruntime, but strictly
> clamp it with min_vruntime(..., se->deadline).
>
> This approach preserves the physical time promise (fixing the bug) but
> includes the min_vruntime guardrail you suggested to ensure we never
> exceed the deadline/lag limit.
>
> Does this align with what you had in mind?

yes

>
> Best,
> Tao
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bdd8c4e688ae..52dc7deea7a1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3755,10 +3755,17 @@ static void reweight_entity(struct cfs_rq
> *cfs_rq, struct sched_entity *se,
>                             unsigned long weight)
>  {
>         bool curr = cfs_rq->curr == se;
> +       bool protect_slice = curr && protect_slice(se);
> +       u64 vslice_rem = 0;
>
>         if (se->on_rq) {
> +
> +               if (protect_slice) {
> +                       vslice_rem = se->vprot - se->vruntime;
> +               }
> +
>                 update_entity_lag(cfs_rq, se);
>                 se->deadline -= se->vruntime;
>                 se->rel_deadline = 1;
> @@ -3792,8 +3799,13 @@ static void reweight_entity(struct cfs_rq
> *cfs_rq, struct sched_entity *se,
>                 if (!curr)
>                         __enqueue_entity(cfs_rq, se);
>                 cfs_rq->nr_queued++;
> -               if (curr)
> -                       set_protect_slice(cfs_rq, se);
> +
> +               if (protect_slice && vslice_rem > 0) {
> +                       vslice_rem = div_s64(vslice_rem *
> se->load.weight, weight);
> +                       se->vprot == se->vruntime + vslice_rem;
> +
> +                       se->vprot = min_vruntime(se->vprot, se->deadline);
> +               }
>         }
>  }
>