linux-kernel - Re: [PATCH 3/3] rcu/exp: Remove needless CPU up quiescent state report

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe931d3a-bf97-4be5-8420-f1fcb55e6a46@paulmck-laptop>
Date: Fri, 14 Feb 2025 01:01:56 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Boqun Feng <boqun.feng@...il.com>,
	Joel Fernandes <joel@...lfernandes.org>,
	Neeraj Upadhyay <neeraj.upadhyay@....com>,
	Uladzislau Rezki <urezki@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>, rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 3/3] rcu/exp: Remove needless CPU up quiescent state
 report

On Fri, Feb 14, 2025 at 12:25:59AM +0100, Frederic Weisbecker wrote:
> A CPU coming online checks for an ongoing grace period and reports
> a quiescent state accordingly if needed. This special treatment that
> shortcuts the expedited IPI finds its origin as an optimization purpose
> on the following commit:
> 
> 	338b0f760e84 (rcu: Better hotplug handling for synchronize_sched_expedited()
> 
> The point is to avoid an IPI while waiting for a CPU to become online
> or failing to become offline.
> 
> However this is pointless and even error prone for several reasons:
> 
> * If the CPU has been seen offline in the first round scanning offline
>   and idle CPUs, no IPI is even tried and the quiescent state is
>   reported on behalf of the CPU.
> 
> * This means that if the IPI fails, the CPU just became offline. So
>   it's unlikely to become online right away, unless the cpu hotplug
>   operation failed and rolled back, which is a rare event that can
>   wait a jiffy for a new IPI to be issued.
> 
> * But then the "optimization" applying on failing CPU hotplug down only
>   applies to !PREEMPT_RCU.
> 
> * This force reports a quiescent state even if ->cpu_no_qs.b.exp is not
>   set. As a result it can race with remote QS reports on the same rdp.
>   Fortunately it happens to be OK but an accident is waiting to happen.
> 
> For all those reasons, remove this optimization that doesn't look worthy
> to keep around.

Thank you for digging into this!

When I ran tests that removed the call to sync_sched_exp_online_cleanup()
a few months ago, I got grace-period hangs [1].  Has something changed
to make this safe?

							Thanx, Paul

[1] https://docs.google.com/document/d/1-JQ4QYF1qid0TWSLa76O1kusdhER2wgm0dYdwFRUzU8/edit?usp=sharing

> Reported-by: Paul E. McKenney <paulmck@...nel.org>
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> ---
>  kernel/rcu/tree.c     |  2 --
>  kernel/rcu/tree_exp.h | 45 ++-----------------------------------------
>  2 files changed, 2 insertions(+), 45 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8625f616c65a..86935fe00397 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -151,7 +151,6 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
>  			      unsigned long gps, unsigned long flags);
>  static void invoke_rcu_core(void);
>  static void rcu_report_exp_rdp(struct rcu_data *rdp);
> -static void sync_sched_exp_online_cleanup(int cpu);
>  static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
>  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
>  static bool rcu_rdp_cpu_online(struct rcu_data *rdp);
> @@ -4259,7 +4258,6 @@ int rcutree_online_cpu(unsigned int cpu)
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>  		return 0; /* Too early in boot for scheduler work. */
> -	sync_sched_exp_online_cleanup(cpu);
>  
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index caff16e441d1..a3f962eb7057 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -751,12 +751,8 @@ static void rcu_exp_handler(void *unused)
>  	struct task_struct *t = current;
>  
>  	/*
> -	 * First, is there no need for a quiescent state from this CPU,
> -	 * or is this CPU already looking for a quiescent state for the
> -	 * current grace period?  If either is the case, just leave.
> -	 * However, this should not happen due to the preemptible
> -	 * sync_sched_exp_online_cleanup() implementation being a no-op,
> -	 * so warn if this does happen.
> +	 * WARN if the CPU is unexpectedly already looking for a
> +	 * QS or has already reported one.
>  	 */
>  	ASSERT_EXCLUSIVE_WRITER_SCOPED(rdp->cpu_no_qs.b.exp);
>  	if (WARN_ON_ONCE(!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> @@ -803,11 +799,6 @@ static void rcu_exp_handler(void *unused)
>  	WARN_ON_ONCE(1);
>  }
>  
> -/* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */
> -static void sync_sched_exp_online_cleanup(int cpu)
> -{
> -}
> -
>  /*
>   * Scan the current list of tasks blocked within RCU read-side critical
>   * sections, printing out the tid of each that is blocking the current
> @@ -885,38 +876,6 @@ static void rcu_exp_handler(void *unused)
>  	rcu_exp_need_qs();
>  }
>  
> -/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. */
> -static void sync_sched_exp_online_cleanup(int cpu)
> -{
> -	unsigned long flags;
> -	int my_cpu;
> -	struct rcu_data *rdp;
> -	int ret;
> -	struct rcu_node *rnp;
> -
> -	rdp = per_cpu_ptr(&rcu_data, cpu);
> -	rnp = rdp->mynode;
> -	my_cpu = get_cpu();
> -	/* Quiescent state either not needed or already requested, leave. */
> -	if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> -	    READ_ONCE(rdp->cpu_no_qs.b.exp)) {
> -		put_cpu();
> -		return;
> -	}
> -	/* Quiescent state needed on current CPU, so set it up locally. */
> -	if (my_cpu == cpu) {
> -		local_irq_save(flags);
> -		rcu_exp_need_qs();
> -		local_irq_restore(flags);
> -		put_cpu();
> -		return;
> -	}
> -	/* Quiescent state needed on some other CPU, send IPI. */
> -	ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> -	put_cpu();
> -	WARN_ON_ONCE(ret);
> -}
> -
>  /*
>   * Because preemptible RCU does not exist, we never have to check for
>   * tasks blocked within RCU read-side critical sections that are
> -- 
> 2.46.0
>