[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEXW_YS_CXbvhdu3EoaDB2zj3YGNUYXQ3C6xGOJ1cyvUJZZYjw@mail.gmail.com>
Date: Wed, 15 Feb 2023 09:51:34 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: "Zhang, Qiang1" <qiang1.zhang@...el.com>
Cc: Frederic Weisbecker <frederic@...nel.org>,
Nicholas Piggin <npiggin@...il.com>,
"paulmck@...nel.org" <paulmck@...nel.org>,
"quic_neeraju@...cinc.com" <quic_neeraju@...cinc.com>,
"Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] rcu: Keeping rcu-related kthreads running on
housekeeping CPUS
On Fri, Feb 10, 2023 at 12:26 AM Zhang, Qiang1 <qiang1.zhang@...el.com> wrote:
>
> > For kernels built with CONFIG_NO_HZ_FULL=y and CONFIG_RCU_NOCB_CPU=y,
> > when passing cpulist to "isolcpus=", "nohz_full=" and "rcu_nocbs="
> > bootparams, after system starting, the rcu-related kthreads(include
> > rcu_gp, rcuog*, rcuop* kthreads etc) will running on housekeeping
> > CPUs, but for cpulist contains CPU0, the result will deferent, these
> > rcu-related kthreads will be restricted to running on CPU0.
> >
> > Although invoke kthread_create() to spwan rcu-related kthreads and
> > when it's starting, invoke set_cpus_allowed_ptr() to allowed cpumaks
> > is housekeeping_cpumask(HK_TYPE_KTHREAD), but due to these rcu-related
> > kthreads are created before starting other CPUS, that is to say, at
> > this time, only CPU0 is online, when these rcu-related kthreads running
> > and set allowed cpumaks is housekeeping cpumask, if find that only CPU0
> > is online and CPU0 exists in "isolcpus=", "nohz_full=" and "rcu_nocbs="
> > bootparams, invoke set_cpus_allowed_ptr() will return error.
> >
> > set_cpus_allowed_ptr()
> > ->__set_cpus_allowed_ptr()
> > ->__set_cpus_allowed_ptr_locked
> > {
> > bool kthread = p->flags & PF_KTHREAD;
> > ....
> > if (kthread || is_migration_disabled(p))
> > cpu_valid_mask = cpu_online_mask;
> > ....
> > dest_cpu = cpumask_any_and_distribute(cpu_valid_mask, ctx->new_mask);
> > if (dest_cpu >= nr_cpu_ids) {
> > ret = -EINVAL;
> > goto out;
> > }
> > ....
> > }
> >
> > At this time, only CPU0 is set in the cpu_online_mask, the ctx->new_mask
> > is housekeeping cpumask and not contains CPU0, this will result dest_cpu
> > is illegal cpu value, the set_cpus_allowed_ptr() will return -EINVAL and
> > failed to set housekeeping cpumask.
> >
> > This commit therefore add additional cpus_allowed_ptr() call in CPU hotplug
> > path. and reset the CPU affinity of rcuboost, rcuog, rcuop kthreads after
> > all other CPUs are online.
> >
> > Signed-off-by: Zqiang <qiang1.zhang@...el.com>
> >
> >Good catch! But based on that and your other fix, I suspect that
> >nohz_full=0... has never been seriously used.
> >
> >A few points:
> >
> >* This is a problem for kthreads in general. And since HK_TYPE_KTHREAD =
> > HK_TYPE_RCU and both are going to be merged in the future, I think we should
> > stop handling the RCU kthreads housekeeping affinity from RCU but let the
> > kthread code handle that and also fix the issue from the kthread code.
> > RCU boost may be an exception since we try to enforce some node locality
> > within the housekeeping range.
>
> Agree. indeed, these works that set housekeeping CPU affinity should not be handled by RCU,
> and not only RCU-related kthreads are affected, other kthreads created earlier also have the
> same problem.
Agreed as well.
> >* If CPU 0 is isolated and it is the boot CPU, we should wait for a secondary
> > CPU to boot before activating nohz_full at all. Unfortunately the nohz_full
> > code is not yet ready for runtime housekeeping cpumask change but work is
> > in progress (I'm saying that for 10 years...)
> >
> >* I'm tempted to revert 08ae95f4fd3b (nohz_full: Allow the boot CPU to be
> > nohz_full) since it doesn't work and nobody ever complained?
>
> Yes if remove 08ae95f4fd3b, this problem will disappear.
Just checking. So what's the next step, revert this?
Thanks.
Powered by blists - more mailing lists