lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a047c7dd-7ac9-4b03-807b-1f77473321bc@paulmck-laptop>
Date:   Wed, 26 Apr 2023 15:29:19 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...a.com, rostedt@...dmis.org, riel@...riel.com
Subject: Re: [PATCH RFC rcu] Stop rcu_tasks_invoke_cbs() from using
 never-online CPUs

On Wed, Apr 26, 2023 at 12:12:05PM -1000, Tejun Heo wrote:
> On Wed, Apr 26, 2023 at 10:26:38AM -0700, Paul E. McKenney wrote:
> > The rcu_tasks_invoke_cbs() relies on queue_work_on() to silently fall
> > back to WORK_CPU_UNBOUND when the specified CPU is offline.  However,
> > the queue_work_on() function's silent fallback mechanism relies on that
> > CPU having been online at some time in the past.  When queue_work_on()
> > is passed a CPU that has never been online, workqueue lockups ensue,
> > which can be bad for your kernel's general health and well-being.
> > 
> > This commit therefore checks whether a given CPU is currently online,
> > and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to
> > queue_work_on().  Why not simply omit the queue_work_on() call entirely?
> > Because this function is flooding callback-invocation notifications
> > to all CPUs, and must deal with possibilities that include a sparse
> > cpu_possible_mask.
> > 
> > Fixes: d363f833c6d88 rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations
> > Reported-by: Tejun Heo <tj@...nel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> ...
> > +		// If a CPU has never been online, queue_work_on()
> > +		// objects to queueing work on that CPU.  Approximate a
> > +		// check for this by checking if the CPU is currently online.
> > +
> > +		cpus_read_lock();
> > +		cpuwq1 = cpu_online(cpunext) ? cpunext : WORK_CPU_UNBOUND;
> > +		cpuwq2 = cpu_online(cpunext + 1) ? cpunext + 1 : WORK_CPU_UNBOUND;
> > +		cpus_read_unlock();
> > +
> > +		// Yes, either CPU could go offline here.  But that is
> > +		// OK because queue_work_on() will (in effect) silently
> > +		// fall back to WORK_CPU_UNBOUND for any CPU that has ever
> > +		// been online.
> 
> Looks like cpus_read_lock() isn't protecting anything really.

It certainly isn't protecting much.  ;-)

The purpose is to avoid a situation where this CPU thinks that some other
CPU is online, but the corresponding workqueue has not yet been created.
And I freely admit that, given the huge amount of synchronization and
delay on the CPU-hotplug path, it is really hard to imagine this getting
messed up.  Except that this is the quick fix, where it would be bad to
rely on anything requiring much thought.

I am working a more dainty fix for mainline that doesn't need
cpus_read_lock(), even in nightmarish theorizing.

> > +		queue_work_on(cpuwq1, system_wq, &rtpcp_next->rtp_work);
> >  		cpunext++;
> >  		if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
> >  			rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
> > -			queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
> > +			queue_work_on(cpuwq2, system_wq, &rtpcp_next->rtp_work);
> 
> As discussed in the thread, I kinda wonder whether just using an unbound
> workqueue would be sufficient but as a fix this looks good to me.

I would agree except for the callback-flood corner case, in which
callback invocation needs all the help it can get if it is to keep up
with a tightish loop doing call_rcu_tasks_*().

> Acked-by: Tejun Heo <tj@...nel.org>

I will apply this on my next rebase, thank you!

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ