[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z67Wy9Jjn0BZa01A@linux.ibm.com>
Date: Fri, 14 Feb 2025 11:08:19 +0530
From: Vishal Chourasia <vishalc@...ux.ibm.com>
To: Phil Auld <pauld@...hat.com>
Cc: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Frederic Weisbecker <frederic@...nel.org>,
Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [CHANGE 1/2] sched/isolation: Make use of more than one
housekeeping cpu
Hi Phil, Vineeth
On Thu, Feb 13, 2025 at 09:26:53AM -0500, Phil Auld wrote:
> On Thu, Feb 13, 2025 at 10:14:04AM +0530 Madadi Vineeth Reddy wrote:
> > Hi Phil Auld,
> >
> > On 11/02/25 19:31, Phil Auld wrote:
> > > The exising code uses housekeeping_any_cpu() to select a cpu for
> > > a given housekeeping task. However, this often ends up calling
> > > cpumask_any_and() which is defined as cpumask_first_and() which has
> > > the effect of alyways using the first cpu among those available.
> > >
> > > The same applies when multiple NUMA nodes are involved. In that
> > > case the first cpu in the local node is chosen which does provide
> > > a bit of spreading but with multiple HK cpus per node the same
> > > issues arise.
> > >
> > > Spread the HK work out by having housekeeping_any_cpu() and
> > > sched_numa_find_closest() use cpumask_any_and_distribute()
> > > instead of cpumask_any_and().
> > >
> >
> > Got the overall intent of the patch for better load distribution on
> > housekeeping tasks. However, one potential drawback could be that by
> > spreading HK work across multiple CPUs might reduce the time that
> > some cores can spend in deeper idle states which can be beneficial for
> > power-sensitive systems.
> >
> > Thoughts?
>
> NOHZ_full setups are not generally used in power sensitive systems I think.
> They aren't in our use cases at least.
>
> In cases with many cpus a single housekeeping cpu can not keep up. Having
> other HK cpus in deep idle states while the one in use is overloaded is
> not a win.
To me, an overloaded CPU sounds like where more than one tasks are ready
to run, and a HK CPU is one receiving periodic scheduling clock
ticks, so HP CPU is bound to comes out of any power-saving state it is in.
>
> If your single HK cpu can keep up then only configure that one HK cpu.
> The others will go idle and stay there. And since they are nohz_full
> might get to stay idle even longer.
While it is good to distribute the load across each HK CPU in the HK
cpumask (queuing jobs on different CPUs each time), this can cause
jitter in virtualized environments. Unnecessaryily evicting other
tenants, when it's better to overload a VP than to wake up other VPs of a
tenant.
>
> I do have a patch that has this controlled by a sched feature if that
> is of interest. Then it could be disabled if you don't want it.
Vishal
>
> Cheers,
> Phil
>
> >
> > Thanks,
> > Madadi Vineeth Reddy
> >
> > > Signed-off-by: Phil Auld <pauld@...hat.com>
> > > Cc: Peter Zijlstra <peterz@...radead.org>
> > > Cc: Juri Lelli <juri.lelli@...hat.com>
> > > Cc: Frederic Weisbecker <frederic@...nel.org>
> > > Cc: Waiman Long <longman@...hat.com>
> > > Cc: linux-kernel@...r.kernel.org
> > > ---
> > > kernel/sched/isolation.c | 2 +-
> > > kernel/sched/topology.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > index 81bc8b329ef1..93b038d48900 100644
> > > --- a/kernel/sched/isolation.c
> > > +++ b/kernel/sched/isolation.c
> > > @@ -40,7 +40,7 @@ int housekeeping_any_cpu(enum hk_type type)
> > > if (cpu < nr_cpu_ids)
> > > return cpu;
> > >
> > > - cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> > > + cpu = cpumask_any_and_distribute(housekeeping.cpumasks[type], cpu_online_mask);
> > > if (likely(cpu < nr_cpu_ids))
> > > return cpu;
> > > /*
> > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > > index c49aea8c1025..94133f843485 100644
> > > --- a/kernel/sched/topology.c
> > > +++ b/kernel/sched/topology.c
> > > @@ -2101,7 +2101,7 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
> > > for (i = 0; i < sched_domains_numa_levels; i++) {
> > > if (!masks[i][j])
> > > break;
> > > - cpu = cpumask_any_and(cpus, masks[i][j]);
> > > + cpu = cpumask_any_and_distribute(cpus, masks[i][j]);
> > > if (cpu < nr_cpu_ids) {
> > > found = cpu;
> > > break;
> >
>
> --
>
Powered by blists - more mailing lists