[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2a16c0e198a6d722b8923b0eec15dd2b32e4320.camel@intel.com>
Date: Thu, 14 Sep 2023 09:23:18 +0000
From: "Zhang, Rui" <rui.zhang@...el.com>
To: "Lu, Aaron" <aaron.lu@...el.com>,
"pierre.gondois@....com" <pierre.gondois@....com>
CC: "peterz@...radead.org" <peterz@...radead.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"Pandruvada, Srinivas" <srinivas.pandruvada@...el.com>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"dietmar.eggemann@....com" <dietmar.eggemann@....com>,
"tj@...nel.org" <tj@...nel.org>
Subject: Re: [PATCH] sched/fair: Skip cpus with no sched domain attached
during NOHZ idle balance
Hi, Pierre,
>
> Yes right indeed,
> This happens when putting a CPU offline (as you mentioned earlier,
> putting a CPU offline clears the CPU in the idle_cpus_mask).
>
> The load balancing related variables
including?
> are unused if a CPU has a NULL
> rq as it cannot pull any task. Ideally we should clear them once,
> when attaching a NULL sd to the CPU.
This sounds good to me. But TBH, I don't have enough confidence to do
so because I'm not crystal clear about how these variables are used.
Some questions about the code below.
>
> The following snipped should do that and solve the issue you
> mentioned:
> --- snip ---
> --- a/include/linux/sched/nohz.h
> +++ b/include/linux/sched/nohz.h
> @@ -9,8 +9,10 @@
> #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
> extern void nohz_balance_enter_idle(int cpu);
> extern int get_nohz_timer_target(void);
> +extern void nohz_clean_sd_state(int cpu);
> #else
> static inline void nohz_balance_enter_idle(int cpu) { }
> +static inline void nohz_clean_sd_state(int cpu) { }
> #endif
>
> #ifdef CONFIG_NO_HZ_COMMON
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b3e25be58e2b..6fcabe5d08f5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11525,6 +11525,9 @@ void nohz_balance_exit_idle(struct rq *rq)
> {
> SCHED_WARN_ON(rq != this_rq());
>
> + if (on_null_domain(rq))
> + return;
> +
> if (likely(!rq->nohz_tick_stopped))
> return;
>
if we force clearing rq->nohz_tick_stopped when detaching domain, why
bother adding the first check?
>
> @@ -11551,6 +11554,17 @@ static void set_cpu_sd_state_idle(int cpu)
> rcu_read_unlock();
> }
>
> +void nohz_clean_sd_state(int cpu) {
> + struct rq *rq = cpu_rq(cpu);
> +
> + rq->nohz_tick_stopped = 0;
> + if (cpumask_test_cpu(cpu, nohz.idle_cpus_mask)) {
> + cpumask_clear_cpu(cpu, nohz.idle_cpus_mask);
> + atomic_dec(&nohz.nr_cpus);
> + }
> + set_cpu_sd_state_idle(cpu);
> +}
> +
detach_destroy_domains
cpu_attach_domain
update_top_cache_domain
as we clears per_cpu(sd_llc, cpu) for the isolated cpu in
cpu_attach_domain(), set_cpu_sd_state_idle() seems to be a no-op here,
no?
thanks,
rui
> /*
> * This routine will record that the CPU is going idle with tick
> stopped.
> * This info will be used in performing idle load balancing in the
> future.
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index d3a3b2646ec4..d31137b5f0ce 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2584,8 +2584,10 @@ static void detach_destroy_domains(const
> struct cpumask *cpu_map)
>
> static_branch_dec_cpuslocked(&sched_asym_cpucapacity);
>
> rcu_read_lock();
> - for_each_cpu(i, cpu_map)
> + for_each_cpu(i, cpu_map) {
> cpu_attach_domain(NULL, &def_root_domain, i);
> + nohz_clean_sd_state(i);
> + }
> rcu_read_unlock();
> }
>
> --- snip ---
>
> Regards,
> Pierre
>
> >
> > >
> > > > + }
> > > > +
> > > > /*
> > > > * The tick is still stopped but load could have been
> > > > added in the
> > > > * meantime. We set the nohz.has_blocked flag to trig
> > > > a
> > > > check of the
> > > > @@ -11585,10 +11609,6 @@ void nohz_balance_enter_idle(int cpu)
> > > > if (rq->nohz_tick_stopped)
> > > > goto out;
> > > > - /* If we're a completely isolated CPU, we don't play:
> > > > */
> > > > - if (on_null_domain(rq))
> > > > - return;
> > > > -
> > > > rq->nohz_tick_stopped = 1;
> > > > cpumask_set_cpu(cpu, nohz.idle_cpus_mask);
> > > >
> > > > Otherwise I could reproduce the issue and the patch was solving
> > > > it,
> > > > so:
> > > > Tested-by: Pierre Gondois <pierre.gondois@....com>
> >
> > Thanks for testing, really appreciated!
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Also, your patch doesn't aim to solve that, but I think there
> > > > is an
> > > > issue
> > > > when updating cpuset.cpus when an isolated partition was
> > > > already
> > > > created:
> > > >
> > > > // Create an isolated partition containing CPU0
> > > > # mkdir cgroup
> > > > # mount -t cgroup2 none cgroup/
> > > > # mkdir cgroup/Testing
> > > > # echo "+cpuset" > cgroup/cgroup.subtree_control
> > > > # echo "+cpuset" > cgroup/Testing/cgroup.subtree_control
> > > > # echo 0 > cgroup/Testing/cpuset.cpus
> > > > # echo isolated > cgroup/Testing/cpuset.cpus.partition
> > > >
> > > > // CPU0's sched domain is detached:
> > > > # ls /sys/kernel/debug/sched/domains/cpu0/
> > > > # ls /sys/kernel/debug/sched/domains/cpu1/
> > > > domain0 domain1
> > > >
> > > > // Change the isolated partition to be CPU1
> > > > # echo 1 > cgroup/Testing/cpuset.cpus
> > > >
> > > > // CPU[0-1] sched domains are not updated:
> > > > # ls /sys/kernel/debug/sched/domains/cpu0/
> > > > # ls /sys/kernel/debug/sched/domains/cpu1/
> > > > domain0 domain1
> > > >
> > Interesting. Let me check and get back to you later on this. :)
> >
> > thanks,
> > rui
Powered by blists - more mailing lists