linux-kernel - Re: [PATCH, RFC, tip/core/rcu] v3 scalable classic RCU implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 31 Aug 2008 12:58:12 +0200
From:	Manfred Spraul <manfred@...orfullife.com>
To:	paulmck@...ux.vnet.ibm.com
CC:	Lai Jiangshan <laijs@...fujitsu.com>, linux-kernel@...r.kernel.org,
	cl@...ux-foundation.org, mingo@...e.hu, akpm@...ux-foundation.org,
	dipankar@...ibm.com, josht@...ux.vnet.ibm.com, schamp@....com,
	niv@...ibm.com, dvhltc@...ibm.com, ego@...ibm.com,
	rostedt@...dmis.org, peterz@...radead.org
Subject: Re: [PATCH, RFC, tip/core/rcu] v3 scalable classic RCU implementation

Paul E. McKenney wrote:
>
>> Perhaps it's possible to rely on CPU_DYING, but I haven't figured out yet 
>> how to handle read-side critical sections in CPU_DYING handlers.
>> Interrupts after CPU_DYING could be handled by rcu_irq_enter(), 
>> rcu_irq_exit() [yes, they exist on x86: the arch code enables the local 
>> interrupts in order to process the currently queued interrupts]
>>     
>
> My feeling is that CPU online/offline will be quite rare, so it should
> be OK to clean up after the races in force_quiescent_state(), which in
> this version is called every three ticks in a given grace period.
>   
If you add failing cpu offline calls, then the problem appears to be 
unsolvable:
If I get it right, the offlining process looks like this:
* one cpu in the system makes the CPU_DOWN_PREPARE notifier call. These 
calls can sleep (e.g. slab sleeps on semaphores). The cpu that goes 
offline is still alive, still doing arbitrary work. cpu_quiet calls on 
behalf of the cpu would be wrong.
* stop_machine: all cpus schedule to a special kernel thread [1], only 
the dying cpu runs.
* The cpu that goes offline calls the CPU_DYING notifiers.
* __cpu_disable(): The cpu that goes offline check if it's possible to 
offline the cpu. At least on i386, this can fail.
On success:
* at least on i386: the cpu that goes offline handles outstanding 
interrupts. I'm not sure, perhaps even softirqs are handled.
* the cpus stopps handling interrupts.
* stop machine leaves, the remaining cpus continue their work.
* The CPU_DEAD notifiers are called. They can sleep.
On failure:
* all cpus continue their work. call_rcu, synchronize_rcu(), ...
* some time later: the CPU_DOWN_FAILED callbacks are called.

Is that description correct?
Then:
- treating a cpu as always quiet after the rcu notifer was called with 
CPU_OFFLINE_PREPARE is wrong: the target cpu still runs normal code: 
user space, kernel space, interrupts, whatever. The target cpu still 
accepts interrupst, thus treating it as "normal" should work.
__cpu_disable() success:
- after CPU_DYING, a cpu is either in an interrupt or outside read-side 
critical sections. Parallel synchronize_rcu() calls are impossible until 
the cpu is dead. call_rcu() is probably possible.
- The CPU_DEAD notifiers are called. a synchronize_rcu() call before the 
rcu notifier is called is possible.
__cpu_disable() failure:
- CPU_DYING is called, but the cpu remains fully alive. The system comes 
fully alive again.
- some time later, CPU_DEAD is called.

With the current CPU_DYING callback, it's impossible to be both 
deadlock-free and race-free with the given conditions. If 
__cpu_disable() succeeds, then the cpu must be treated as gone and 
always idle. If __cpu_disable() fails, then the cpu must be treated as 
fully there. Doing both things at the same time is impossible. Waiting 
until CPU_DOWN_FAILED or CPU_DEAD is called is impossible, too: Either 
synchronize_rcu() in a CPU_DEAD notifier [called before the rcu 
notifier] would deadlock or read-side critical sections on the 
not-killed cpu would race.

What about moving the CPU_DYING notifier calls behind the 
__cpu_disable() call?
Any other solutions?

Btw, as far as I can see, rcupreempt would deadlock if a CPU_DEAD 
notifier uses synchronize_rcu().
Probably noone will ever succeed in triggering the deadlock:
- cpu goes offline.
- the other cpus in the system are restarted.
- one cpu does the CPU_DEAD notifier calls.
- before the rcu notifier is called with CPU_DEAD:
- one CPU_DEAD notifier sleeps.
- while CPU_DEAD is sleeping: on the same cpu: kmem_cache_destroy is 
called. get_online_cpus immediately succeeds.
- kmem_cache_destroy acquires the cache_chain_mutex.
- kmem_cache_destroy does synchronize_rcu(), it sleeps.
- CPU_DEAD processing continues, the slab CPU_DEAD tries to acquire the 
cache_chain_mutex. it sleeps, too.
--> deadlock, because the already dead cpu will never signal itself as 
quiet. Thus synchronize_rcu() will never succeed, thus the slab CPU_DEAD 
notifier will never return, thus rcu_offline_cpu() is never called.

--
    Manfred
[1] open question: with rcu_preempt, is it possible that these cpus 
could be inside read side critical sections?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/