[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBXxFC_FJHbnRafN3-6Fs=kJxMvGaStiKtp8T06p5Xr4A@mail.gmail.com>
Date: Mon, 14 Sep 2020 14:26:18 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Jiang Biao <benbjiang@...il.com>
Cc: Aubrey Li <aubrey.li@...el.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <valentin.schneider@....com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Aubrey Li <aubrey.li@...ux.intel.com>
Subject: Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask
in sched domain
On Sun, 13 Sep 2020 at 05:59, Jiang Biao <benbjiang@...il.com> wrote:
>
> Hi, Aubrey
>
> On Fri, 11 Sep 2020 at 23:48, Aubrey Li <aubrey.li@...el.com> wrote:
> >
> > Added idle cpumask to track idle cpus in sched domain. When a CPU
> > enters idle, its corresponding bit in the idle cpumask will be set,
> > and when the CPU exits idle, its bit will be cleared.
> >
> > When a task wakes up to select an idle cpu, scanning idle cpumask
> > has low cost than scanning all the cpus in last level cache domain,
> > especially when the system is heavily loaded.
> >
> > Signed-off-by: Aubrey Li <aubrey.li@...ux.intel.com>
> > ---
> > include/linux/sched/topology.h | 13 +++++++++++++
> > kernel/sched/fair.c | 4 +++-
> > kernel/sched/topology.c | 2 +-
> > 3 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > index fb11091129b3..43a641d26154 100644
> > --- a/include/linux/sched/topology.h
> > +++ b/include/linux/sched/topology.h
> > @@ -65,8 +65,21 @@ struct sched_domain_shared {
> > atomic_t ref;
> > atomic_t nr_busy_cpus;
> > int has_idle_cores;
> > + /*
> > + * Span of all idle CPUs in this domain.
> > + *
> > + * NOTE: this field is variable length. (Allocated dynamically
> > + * by attaching extra space to the end of the structure,
> > + * depending on how many CPUs the kernel has booted up with)
> > + */
> > + unsigned long idle_cpus_span[];
> > };
> >
> > +static inline struct cpumask *sds_idle_cpus(struct sched_domain_shared *sds)
> > +{
> > + return to_cpumask(sds->idle_cpus_span);
> > +}
> > +
> > struct sched_domain {
> > /* These fields must be setup */
> > struct sched_domain __rcu *parent; /* top domain must be null terminated */
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6b3b59cc51d6..3b6f8a3589be 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6136,7 +6136,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >
> > time = cpu_clock(this);
> >
> > - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > + cpumask_and(cpus, sds_idle_cpus(sd->shared), p->cpus_ptr);
> Is the sds_idle_cpus() always empty if nohz=off?
Good point
> Do we need to initialize the idle_cpus_span with sched_domain_span(sd)?
>
> >
> > for_each_cpu_wrap(cpu, cpus, target) {
> > if (!--nr)
> > @@ -10182,6 +10182,7 @@ static void set_cpu_sd_state_busy(int cpu)
> > sd->nohz_idle = 0;
> >
> > atomic_inc(&sd->shared->nr_busy_cpus);
> > + cpumask_clear_cpu(cpu, sds_idle_cpus(sd->shared));
> > unlock:
> > rcu_read_unlock();
> > }
> > @@ -10212,6 +10213,7 @@ static void set_cpu_sd_state_idle(int cpu)
> > sd->nohz_idle = 1;
> >
> > atomic_dec(&sd->shared->nr_busy_cpus);
> > + cpumask_set_cpu(cpu, sds_idle_cpus(sd->shared));
> This only works when entering/exiting tickless mode? :)
> Why not update idle_cpus_span during tick_nohz_idle_enter()/exit()?
set_cpu_sd_state_busy is only called during a tick in order to limit
the rate of the update to once per tick per cpu at most and prevents
any kind of storm of update if short running tasks wake/sleep all the
time. We don't want to update a cpumask at each and every enter/leave
idle.
>
> Thx.
> Regards,
> Jiang
>
> > unlock:
> > rcu_read_unlock();
> > }
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9079d865a935..92d0aeef86bf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -1769,7 +1769,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
> >
> > *per_cpu_ptr(sdd->sd, j) = sd;
> >
> > - sds = kzalloc_node(sizeof(struct sched_domain_shared),
> > + sds = kzalloc_node(sizeof(struct sched_domain_shared) + cpumask_size(),
> > GFP_KERNEL, cpu_to_node(j));
> > if (!sds)
> > return -ENOMEM;
> > --
> > 2.25.1
> >
Powered by blists - more mailing lists