linux-kernel - Re: [PATCH] rcu: Use cpus_read_lock() while looking at cpu_online

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180911162142.cc3vgook2gctus4c@linutronix.de>
Date:   Tue, 11 Sep 2018 18:21:42 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     linux-kernel@...r.kernel.org, Boqun Feng <boqun.feng@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        tglx@...utronix.de, Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [PATCH] rcu: Use cpus_read_lock() while looking at
 cpu_online_mask

On 2018-09-11 09:05:32 [-0700], Paul E. McKenney wrote:
> On Mon, Sep 10, 2018 at 03:56:16PM +0200, Sebastian Andrzej Siewior wrote:
> > It was possible that sync_rcu_exp_select_cpus() enqueued something on
> > CPU0 while CPU0 was offline. Such a work item wouldn't be processed
> > until CPU0 gets back online. This problem was addressed in commit
> > fcc6354365015 ("rcu: Make expedited GPs handle CPU 0 being offline"). I
> > don't think the issue fully addressed.
> > 
> > Assume grplo = 0 and grphi = 7 and sync_rcu_exp_select_cpus() is invoked
> > on CPU1. The preempt_disable() section on CPU1 won't ensure that CPU0
> > remains online between looking at cpu_online_mask and invoking
> > queue_work_on() on CPU1.
> > 
> > Use cpus_read_lock() to ensure that `cpu' is not going down between
> > looking at cpu_online_mask at invoking queue_work_on() and waiting for
> > its completion. It is added around the loop + flush_work() which is
> > similar to work_on_cpu_safe() (and we can have multiple jobs running on
> > NUMA systems).
> 
> Is this experimental or theoretical?

theoretical. I saw that hunk on RT and I can't have queue_work() within
a preempt_disable() section here.

> If theoretical, the counter-theory
> is that the stop-machine processing prevents any of the cpu_online_mask
> bits from changing, though, yes, we would like to get rid of the
> stop-machine processing.  So either way, yes, the current state could
> use some improvement.
> 
> But one problem with the patch below is that sync_rcu_exp_select_cpus()
> can be called while the cpu_hotplug_lock is write-held.  Or is that
> somehow OK these days?  

depends. Is it okay to wait until the write-lock is dropped? If it is,
then it is okay. If not…

> Assuming not, how about the (untested) patch
> below?

Doesn't work for me because it is still within the preempt-disable
section :/.
Would it work to use WORK_CPU_UNBOUND? As far as I understand it, the
CPU number does not matter, you just want to spread it across multiple
CPUs in the NUMA case.

> 							Thanx, Paul
> 
> commit 5214cbbfe6a5d6b92c76c4e411a049fe57245d4a
> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Date:   Tue Sep 11 08:57:48 2018 -0700
> 
>     rcu: Stop expedited grace periods from relying on stop-machine
>     
>     The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption
>     to prevent the cpu_online_mask from changing.  However, this relies on
>     the stop-machine mechanism in the CPU-hotplug offline code, which is not
>     desirable (it would be good to someday remove the stop-machine mechanism).

not that I tested it, but I still don't understand how a
preempt_disable() section on CPU1 can ensure that CPU3 won't go down. Is
there some code that invokes stop_cpus() for each CPU or so?

Sebastian