linux-kernel - Re: [PATCH RFC rcu] Stop rcu_tasks_invoke

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZEmhtdxpelt5jxAu@slm.duckdns.org>
Date:   Wed, 26 Apr 2023 12:12:05 -1000
From:   Tejun Heo <tj@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...a.com, rostedt@...dmis.org, riel@...riel.com
Subject: Re: [PATCH RFC rcu] Stop rcu_tasks_invoke_cbs() from using
 never-online CPUs

On Wed, Apr 26, 2023 at 10:26:38AM -0700, Paul E. McKenney wrote:
> The rcu_tasks_invoke_cbs() relies on queue_work_on() to silently fall
> back to WORK_CPU_UNBOUND when the specified CPU is offline.  However,
> the queue_work_on() function's silent fallback mechanism relies on that
> CPU having been online at some time in the past.  When queue_work_on()
> is passed a CPU that has never been online, workqueue lockups ensue,
> which can be bad for your kernel's general health and well-being.
> 
> This commit therefore checks whether a given CPU is currently online,
> and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to
> queue_work_on().  Why not simply omit the queue_work_on() call entirely?
> Because this function is flooding callback-invocation notifications
> to all CPUs, and must deal with possibilities that include a sparse
> cpu_possible_mask.
> 
> Fixes: d363f833c6d88 rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations
> Reported-by: Tejun Heo <tj@...nel.org>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
...
> +		// If a CPU has never been online, queue_work_on()
> +		// objects to queueing work on that CPU.  Approximate a
> +		// check for this by checking if the CPU is currently online.
> +
> +		cpus_read_lock();
> +		cpuwq1 = cpu_online(cpunext) ? cpunext : WORK_CPU_UNBOUND;
> +		cpuwq2 = cpu_online(cpunext + 1) ? cpunext + 1 : WORK_CPU_UNBOUND;
> +		cpus_read_unlock();
> +
> +		// Yes, either CPU could go offline here.  But that is
> +		// OK because queue_work_on() will (in effect) silently
> +		// fall back to WORK_CPU_UNBOUND for any CPU that has ever
> +		// been online.

Looks like cpus_read_lock() isn't protecting anything really.

> +		queue_work_on(cpuwq1, system_wq, &rtpcp_next->rtp_work);
>  		cpunext++;
>  		if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) {
>  			rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext);
> -			queue_work_on(cpunext, system_wq, &rtpcp_next->rtp_work);
> +			queue_work_on(cpuwq2, system_wq, &rtpcp_next->rtp_work);

As discussed in the thread, I kinda wonder whether just using an unbound
workqueue would be sufficient but as a fix this looks good to me.

Acked-by: Tejun Heo <tj@...nel.org>

Thanks.

-- 
tejun