[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtD27L0Epg7wPzw7G2zDL8XgdVbB45dZEZEsLmuwRp5gcg@mail.gmail.com>
Date: Tue, 17 Nov 2020 16:53:10 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Valentin Schneider <valentin.schneider@....com>,
Juri Lelli <juri.lelli@...hat.com>
Subject: Re: [PATCH 3/3] sched/numa: Limit the amount of imbalance that can
exist at fork time
On Tue, 17 Nov 2020 at 16:17, Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> On Tue, Nov 17, 2020 at 03:31:19PM +0100, Vincent Guittot wrote:
> > On Tue, 17 Nov 2020 at 15:18, Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Tue, Nov 17, 2020 at 01:42:22PM +0000, Mel Gorman wrote:
> > > > - if (local_sgs.idle_cpus)
> > > > + if (local_sgs.idle_cpus >= (sd->span_weight >> 2))
> > > > return NULL;
> > >
> > > Is that the same 25% ?
> >
> > same question for me
>
> It's the same 25%. It's in the comment as -- utilisation is not too high
utilization is misleading, it usually refers to pelt utilization
whereas whet you check is the number of busy cpus
> where "high" is related to adjust_numa_imbalance.
>
> > could we encapsulate this 25% allowed imbalance like for adjust_numa_imbalance
>
> Using adjust_numa_imbalance() directly in this context would be awkward
Would be good to use the same function to say if we allow or not the imbalance
something like numa_allow_imbalance(sg_lb_stats * group_stats)
which would return how much margin remains available before not
allowing the imbalance
and use the same metrics in all cases
> but the threshold could be shared with something like the additional
> diff below. Is that what you had in mind or something different?
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index adfab218a498..49ef3484c56c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5878,6 +5878,8 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> static struct sched_group *
> find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
>
> +static inline int numa_imbalance_threshold(int weight);
> +
> /*
> * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
> */
> @@ -8894,7 +8896,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
> * If there is a real need of migration, periodic load
> * balance will take care of it.
> */
> - if (local_sgs.idle_cpus >= (sd->span_weight >> 2))
also here you use idle_cpus and on the other part you use nr_running.
Can't we use the same metrics on both sides?
> + if (local_sgs.idle_cpus >= numa_imbalance_threshold(sd->span_weight))
> return NULL;
> }
>
> @@ -8998,6 +9000,14 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>
> #define NUMA_IMBALANCE_MIN 2
>
> +/* Allows imbalances until active CPUs hits 25% of a domain */
> +#define IMBALANCE_THRESHOLD_SHIFT 2
> +
> +static inline int numa_imbalance_threshold(int weight)
> +{
> + return weight >> IMBALANCE_THRESHOLD_SHIFT;
> +}
> +
> static inline long adjust_numa_imbalance(int imbalance,
> int dst_running, int dst_weight)
> {
> @@ -9006,8 +9016,10 @@ static inline long adjust_numa_imbalance(int imbalance,
> * when the destination is lightly loaded so that pairs of
> * communicating tasks may remain local.
> */
> - if (dst_running < (dst_weight >> 2) && imbalance <= NUMA_IMBALANCE_MIN)
> + if (dst_running < numa_imbalance_threshold(dst_weight) &&
> + imbalance <= NUMA_IMBALANCE_MIN) {
> return 0;
> + }
>
> return imbalance;
> }
>
> --
> Mel Gorman
> SUSE Labs
Powered by blists - more mailing lists