[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe931d3a-bf97-4be5-8420-f1fcb55e6a46@paulmck-laptop>
Date: Fri, 14 Feb 2025 01:01:56 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Boqun Feng <boqun.feng@...il.com>,
Joel Fernandes <joel@...lfernandes.org>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Uladzislau Rezki <urezki@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 3/3] rcu/exp: Remove needless CPU up quiescent state
report
On Fri, Feb 14, 2025 at 12:25:59AM +0100, Frederic Weisbecker wrote:
> A CPU coming online checks for an ongoing grace period and reports
> a quiescent state accordingly if needed. This special treatment that
> shortcuts the expedited IPI finds its origin as an optimization purpose
> on the following commit:
>
> 338b0f760e84 (rcu: Better hotplug handling for synchronize_sched_expedited()
>
> The point is to avoid an IPI while waiting for a CPU to become online
> or failing to become offline.
>
> However this is pointless and even error prone for several reasons:
>
> * If the CPU has been seen offline in the first round scanning offline
> and idle CPUs, no IPI is even tried and the quiescent state is
> reported on behalf of the CPU.
>
> * This means that if the IPI fails, the CPU just became offline. So
> it's unlikely to become online right away, unless the cpu hotplug
> operation failed and rolled back, which is a rare event that can
> wait a jiffy for a new IPI to be issued.
>
> * But then the "optimization" applying on failing CPU hotplug down only
> applies to !PREEMPT_RCU.
>
> * This force reports a quiescent state even if ->cpu_no_qs.b.exp is not
> set. As a result it can race with remote QS reports on the same rdp.
> Fortunately it happens to be OK but an accident is waiting to happen.
>
> For all those reasons, remove this optimization that doesn't look worthy
> to keep around.
Thank you for digging into this!
When I ran tests that removed the call to sync_sched_exp_online_cleanup()
a few months ago, I got grace-period hangs [1]. Has something changed
to make this safe?
Thanx, Paul
[1] https://docs.google.com/document/d/1-JQ4QYF1qid0TWSLa76O1kusdhER2wgm0dYdwFRUzU8/edit?usp=sharing
> Reported-by: Paul E. McKenney <paulmck@...nel.org>
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> ---
> kernel/rcu/tree.c | 2 --
> kernel/rcu/tree_exp.h | 45 ++-----------------------------------------
> 2 files changed, 2 insertions(+), 45 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8625f616c65a..86935fe00397 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -151,7 +151,6 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
> unsigned long gps, unsigned long flags);
> static void invoke_rcu_core(void);
> static void rcu_report_exp_rdp(struct rcu_data *rdp);
> -static void sync_sched_exp_online_cleanup(int cpu);
> static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
> static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
> static bool rcu_rdp_cpu_online(struct rcu_data *rdp);
> @@ -4259,7 +4258,6 @@ int rcutree_online_cpu(unsigned int cpu)
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> return 0; /* Too early in boot for scheduler work. */
> - sync_sched_exp_online_cleanup(cpu);
>
> // Stop-machine done, so allow nohz_full to disable tick.
> tick_dep_clear(TICK_DEP_BIT_RCU);
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index caff16e441d1..a3f962eb7057 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -751,12 +751,8 @@ static void rcu_exp_handler(void *unused)
> struct task_struct *t = current;
>
> /*
> - * First, is there no need for a quiescent state from this CPU,
> - * or is this CPU already looking for a quiescent state for the
> - * current grace period? If either is the case, just leave.
> - * However, this should not happen due to the preemptible
> - * sync_sched_exp_online_cleanup() implementation being a no-op,
> - * so warn if this does happen.
> + * WARN if the CPU is unexpectedly already looking for a
> + * QS or has already reported one.
> */
> ASSERT_EXCLUSIVE_WRITER_SCOPED(rdp->cpu_no_qs.b.exp);
> if (WARN_ON_ONCE(!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> @@ -803,11 +799,6 @@ static void rcu_exp_handler(void *unused)
> WARN_ON_ONCE(1);
> }
>
> -/* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */
> -static void sync_sched_exp_online_cleanup(int cpu)
> -{
> -}
> -
> /*
> * Scan the current list of tasks blocked within RCU read-side critical
> * sections, printing out the tid of each that is blocking the current
> @@ -885,38 +876,6 @@ static void rcu_exp_handler(void *unused)
> rcu_exp_need_qs();
> }
>
> -/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
> -static void sync_sched_exp_online_cleanup(int cpu)
> -{
> - unsigned long flags;
> - int my_cpu;
> - struct rcu_data *rdp;
> - int ret;
> - struct rcu_node *rnp;
> -
> - rdp = per_cpu_ptr(&rcu_data, cpu);
> - rnp = rdp->mynode;
> - my_cpu = get_cpu();
> - /* Quiescent state either not needed or already requested, leave. */
> - if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> - READ_ONCE(rdp->cpu_no_qs.b.exp)) {
> - put_cpu();
> - return;
> - }
> - /* Quiescent state needed on current CPU, so set it up locally. */
> - if (my_cpu == cpu) {
> - local_irq_save(flags);
> - rcu_exp_need_qs();
> - local_irq_restore(flags);
> - put_cpu();
> - return;
> - }
> - /* Quiescent state needed on some other CPU, send IPI. */
> - ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> - put_cpu();
> - WARN_ON_ONCE(ret);
> -}
> -
> /*
> * Because preemptible RCU does not exist, we never have to check for
> * tasks blocked within RCU read-side critical sections that are
> --
> 2.46.0
>
Powered by blists - more mailing lists