[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKfTPtAjC+sLga2OcxJGwBaovo0RGMtQRQYe9nRVhcLyiHtJSQ@mail.gmail.com>
Date: Tue, 24 Sep 2024 10:57:28 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Pierre Gondois <pierre.gondois@....com>
Cc: linux-kernel@...r.kernel.org, stable@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/fair: Fix integer underflow
On Fri, 13 Sept 2024 at 19:36, Pierre Gondois <pierre.gondois@....com> wrote:
>
> Hello Vincent,
>
> On 9/13/24 18:14, Vincent Guittot wrote:
> > Hi Pierre
> >
> > On Fri, 13 Sept 2024 at 10:58, Pierre Gondois <pierre.gondois@....com> wrote:
> >>
> >> (struct sg_lb_stats).idle_cpus is of type 'unsigned int'.
> >> (local->idle_cpus - busiest->idle_cpus) can underflow to UINT_MAX
> >> for instance, and max_t(long, 0, UINT_MAX) will output UINT_MAX.
> >>
> >> Use lsub_positive() instead of max_t().
> >
> > Have you faced the problem or this patch is based on code review ?
> >
> > we have the below in sched_balance_find_src_group() that should ensure
> > that we have local->idle_cpus > busiest->idle_cpus
> >
> > if (busiest->group_weight > 1 &&
> > local->idle_cpus <= (busiest->idle_cpus + 1)) {
> > /*
> > * If the busiest group is not overloaded
> > * and there is no imbalance between this and busiest
> > * group wrt idle CPUs, it is balanced. The imbalance
> > * becomes significant if the diff is greater than 1
> > * otherwise we might end up to just move the imbalance
> > * on another group. Of course this applies only if
> > * there is more than 1 CPU per group.
> > */
> > goto out_balanced;
> > }
>
> It was with this setup:
> - busiest->group_type == group_overloaded
> so it might not have checked the value. I can check the exact path if desired,
I'm curious that we never faced this before as the change is almost 4
years old now but there are probably some topologies that can end up
in such situation
Also the fix tag should be
Fixes: 16b0a7a1a0af ("sched/fair: Ensure tasks spreading in LLC during LB")
Before this commit, group_overloaded was not comparing number of tasks
With the correct fix tag:
Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
>
> Regards,
> Pierre
>
>
> >
> >>
> >> Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()")
> >> cc: stable@...r.kernel.org
> >> Signed-off-by: Pierre Gondois <pierre.gondois@....com>
> >> ---
> >> kernel/sched/fair.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 9057584ec06d..6d9124499f52 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -10775,8 +10775,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> >> * idle CPUs.
> >> */
> >> env->migration_type = migrate_task;
> >> - env->imbalance = max_t(long, 0,
> >> - (local->idle_cpus - busiest->idle_cpus));
> >> + env->imbalance = local->idle_cpus;
> >> + lsub_positive(&env->imbalance, busiest->idle_cpus);
> >> }
> >>
> >> #ifdef CONFIG_NUMA
> >> --
> >> 2.25.1
> >>
Powered by blists - more mailing lists