[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250213142653.GA472203@pauld.westford.csb>
Date: Thu, 13 Feb 2025 09:26:53 -0500
From: Phil Auld <pauld@...hat.com>
To: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Frederic Weisbecker <frederic@...nel.org>,
Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [CHANGE 1/2] sched/isolation: Make use of more than one
housekeeping cpu
On Thu, Feb 13, 2025 at 10:14:04AM +0530 Madadi Vineeth Reddy wrote:
> Hi Phil Auld,
>
> On 11/02/25 19:31, Phil Auld wrote:
> > The exising code uses housekeeping_any_cpu() to select a cpu for
> > a given housekeeping task. However, this often ends up calling
> > cpumask_any_and() which is defined as cpumask_first_and() which has
> > the effect of alyways using the first cpu among those available.
> >
> > The same applies when multiple NUMA nodes are involved. In that
> > case the first cpu in the local node is chosen which does provide
> > a bit of spreading but with multiple HK cpus per node the same
> > issues arise.
> >
> > Spread the HK work out by having housekeeping_any_cpu() and
> > sched_numa_find_closest() use cpumask_any_and_distribute()
> > instead of cpumask_any_and().
> >
>
> Got the overall intent of the patch for better load distribution on
> housekeeping tasks. However, one potential drawback could be that by
> spreading HK work across multiple CPUs might reduce the time that
> some cores can spend in deeper idle states which can be beneficial for
> power-sensitive systems.
>
> Thoughts?
NOHZ_full setups are not generally used in power sensitive systems I think.
They aren't in our use cases at least.
In cases with many cpus a single housekeeping cpu can not keep up. Having
other HK cpus in deep idle states while the one in use is overloaded is
not a win.
If your single HK cpu can keep up then only configure that one HK cpu.
The others will go idle and stay there. And since they are nohz_full
might get to stay idle even longer.
I do have a patch that has this controlled by a sched feature if that
is of interest. Then it could be disabled if you don't want it.
Cheers,
Phil
>
> Thanks,
> Madadi Vineeth Reddy
>
> > Signed-off-by: Phil Auld <pauld@...hat.com>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> > Cc: Juri Lelli <juri.lelli@...hat.com>
> > Cc: Frederic Weisbecker <frederic@...nel.org>
> > Cc: Waiman Long <longman@...hat.com>
> > Cc: linux-kernel@...r.kernel.org
> > ---
> > kernel/sched/isolation.c | 2 +-
> > kernel/sched/topology.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > index 81bc8b329ef1..93b038d48900 100644
> > --- a/kernel/sched/isolation.c
> > +++ b/kernel/sched/isolation.c
> > @@ -40,7 +40,7 @@ int housekeeping_any_cpu(enum hk_type type)
> > if (cpu < nr_cpu_ids)
> > return cpu;
> >
> > - cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> > + cpu = cpumask_any_and_distribute(housekeeping.cpumasks[type], cpu_online_mask);
> > if (likely(cpu < nr_cpu_ids))
> > return cpu;
> > /*
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index c49aea8c1025..94133f843485 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -2101,7 +2101,7 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
> > for (i = 0; i < sched_domains_numa_levels; i++) {
> > if (!masks[i][j])
> > break;
> > - cpu = cpumask_any_and(cpus, masks[i][j]);
> > + cpu = cpumask_any_and_distribute(cpus, masks[i][j]);
> > if (cpu < nr_cpu_ids) {
> > found = cpu;
> > break;
>
--
Powered by blists - more mailing lists