linux-kernel - Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBPhDQD7XLcqpS_oX68GZ49=kOkhtnKHNT2NoPSN8rR2w@mail.gmail.com>
Date: Thu, 22 Aug 2024 10:21:39 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Hongyan Xia <hongyan.xia2@....com>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org, 
	kprateek.nayak@....com, wuyun.abel@...edance.com, youssefesmat@...omium.org, 
	tglx@...utronix.de, efault@....de, Luis Machado <luis.machado@....com>
Subject: Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue

On Thu, 22 Aug 2024 at 10:19, Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> Hi,
>
> On Wed, 21 Aug 2024 at 15:34, Hongyan Xia <hongyan.xia2@....com> wrote:
> >
> > Hi Peter,
> >
> > Sorry for bombarding this thread in the last couple of days. I'm seeing
> > several issues in the latest tip/sched/core after these patches landed.
> >
> > What I'm now seeing seems to be an unbalanced call of util_est. First, I applied
>
> I also see a remaining util_est for idle rq because of an unbalance
> call of util_est_enqueue|dequeue
>
> > the following diff to warn against util_est != 0 when no tasks are on
> > the queue:
> >
> > https://lore.kernel.org/all/752ae417c02b9277ca3ec18893747c54dd5f277f.1724245193.git.hongyan.xia2@arm.com/
> >
> > Then, I'm reliably seeing warnings on my Juno board during boot in
> > latest tip/sched/core.
> >
> > If I do the same thing to util_est just like what you did in this uclamp
> > patch, like this:
>
> I think that the solution is simpler than your proposal and we just
> need to always call util_est_enqueue() before the
> requeue_delayed_entity

I have been too quick and the below doesn't fix the problem
>
> @@ -6970,11 +6970,6 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
>         int rq_h_nr_running = rq->cfs.h_nr_running;
>         u64 slice = 0;
>
> -       if (flags & ENQUEUE_DELAYED) {
> -               requeue_delayed_entity(se);
> -               return;
> -       }
> -
>         /*
>          * The code below (indirectly) updates schedutil which looks at
>          * the cfs_rq utilization to select a frequency.
> @@ -6983,6 +6978,11 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
>          */
>         util_est_enqueue(&rq->cfs, p);
>
> +       if (flags & ENQUEUE_DELAYED) {
> +               requeue_delayed_entity(se);
> +               return;
> +       }
> +
>         /*
>          * If in_iowait is set, the code below may not trigger any cpufreq
>          * utilization updates, so do it here explicitly with the IOWAIT flag
>
>
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 574ef19df64b..58aac42c99e5 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6946,7 +6946,7 @@ enqueue_task_fair(struct rq *rq, struct
> > task_struct *p, int flags)
> >
> >         if (flags & ENQUEUE_DELAYED) {
> >                 requeue_delayed_entity(se);
> > -               return;
> > +               goto util_est;
> >         }
> >
> >         /*
> > @@ -6955,7 +6955,6 @@ enqueue_task_fair(struct rq *rq, struct
> > task_struct *p, int flags)
> >          * Let's add the task's estimated utilization to the cfs_rq's
> >          * estimated utilization, before we update schedutil.
> >          */
> > -       util_est_enqueue(&rq->cfs, p);
> >
> >         /*
> >          * If in_iowait is set, the code below may not trigger any cpufreq
> > @@ -7050,6 +7049,9 @@ enqueue_task_fair(struct rq *rq, struct
> > task_struct *p, int flags)
> >         assert_list_leaf_cfs_rq(rq);
> >
> >         hrtick_update(rq);
> > +util_est:
> > +       if (!p->se.sched_delayed)
> > +               util_est_enqueue(&rq->cfs, p);
> >   }
> >
> >   static void set_next_buddy(struct sched_entity *se);
> > @@ -7173,7 +7175,8 @@ static int dequeue_entities(struct rq *rq, struct
> > sched_entity *se, int flags)
> >    */
> >   static bool dequeue_task_fair(struct rq *rq, struct task_struct *p,
> > int flags)
> >   {
> > -       util_est_dequeue(&rq->cfs, p);
> > +       if (!p->se.sched_delayed)
> > +               util_est_dequeue(&rq->cfs, p);
> >
> >         if (dequeue_entities(rq, &p->se, flags) < 0) {
> >                 if (!rq->cfs.h_nr_running)
> >
> > which is basically enqueuing util_est after enqueue_task_fair(),
> > dequeuing util_est before dequeue_task_fair() and double check
> > p->se.delayed_dequeue, then the unbalanced issue seems to go away.
> >
> > Hopefully this helps you in finding where the problem could be.
> >
> > Hongyan
> >
> > On 27/07/2024 11:27, Peter Zijlstra wrote:
> > > Delayed dequeue has tasks sit around on the runqueue that are not
> > > actually runnable -- specifically, they will be dequeued the moment
> > > they get picked.
> > >
> > > One side-effect is that such a task can get migrated, which leads to a
> > > 'nested' dequeue_task() scenario that messes up uclamp if we don't
> > > take care.
> > >
> > > Notably, dequeue_task(DEQUEUE_SLEEP) can 'fail' and keep the task on
> > > the runqueue. This however will have removed the task from uclamp --
> > > per uclamp_rq_dec() in dequeue_task(). So far so good.
> > >
> > > However, if at that point the task gets migrated -- or nice adjusted
> > > or any of a myriad of operations that does a dequeue-enqueue cycle --
> > > we'll pass through dequeue_task()/enqueue_task() again. Without
> > > modification this will lead to a double decrement for uclamp, which is
> > > wrong.
> > >
> > > Reported-by: Luis Machado <luis.machado@....com>
> > > Reported-by: Hongyan Xia <hongyan.xia2@....com>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> > > ---
> > >   kernel/sched/core.c |   16 +++++++++++++++-
> > >   1 file changed, 15 insertions(+), 1 deletion(-)
> > >
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1676,6 +1676,9 @@ static inline void uclamp_rq_inc(struct
> > >       if (unlikely(!p->sched_class->uclamp_enabled))
> > >               return;
> > >
> > > +     if (p->se.sched_delayed)
> > > +             return;
> > > +
> > >       for_each_clamp_id(clamp_id)
> > >               uclamp_rq_inc_id(rq, p, clamp_id);
> > >
> > > @@ -1700,6 +1703,9 @@ static inline void uclamp_rq_dec(struct
> > >       if (unlikely(!p->sched_class->uclamp_enabled))
> > >               return;
> > >
> > > +     if (p->se.sched_delayed)
> > > +             return;
> > > +
> > >       for_each_clamp_id(clamp_id)
> > >               uclamp_rq_dec_id(rq, p, clamp_id);
> > >   }
> > > @@ -1979,8 +1985,12 @@ void enqueue_task(struct rq *rq, struct
> > >               psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED));
> > >       }
> > >
> > > -     uclamp_rq_inc(rq, p);
> > >       p->sched_class->enqueue_task(rq, p, flags);
> > > +     /*
> > > +      * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
> > > +      * ->sched_delayed.
> > > +      */
> > > +     uclamp_rq_inc(rq, p);
> > >
> > >       if (sched_core_enabled(rq))
> > >               sched_core_enqueue(rq, p);
> > > @@ -2002,6 +2012,10 @@ inline bool dequeue_task(struct rq *rq,
> > >               psi_dequeue(p, flags & DEQUEUE_SLEEP);
> > >       }
> > >
> > > +     /*
> > > +      * Must be before ->dequeue_task() because ->dequeue_task() can 'fail'
> > > +      * and mark the task ->sched_delayed.
> > > +      */
> > >       uclamp_rq_dec(rq, p);
> > >       return p->sched_class->dequeue_task(rq, p, flags);
> > >   }
> > >
> > >