linux-kernel - Re: [PATCH] cpu/hotplug: fix rollback during error-out in __cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160406195133.GB3485@osiris>
Date:	Wed, 6 Apr 2016 21:51:33 +0200
From:	Heiko Carstens <heiko.carstens@...ibm.com>
To:	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Sebastian Andrzej Siewior <sebastian.siewior@...utronix.de>,
	linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
	rt@...utronix.de, Martin Schwidefsky <schwidefsky@...ibm.com>,
	Anna-Maria Gleixner <anna-maria@...utronix.de>
Subject: Re: [PATCH] cpu/hotplug: fix rollback during error-out in
 __cpu_disable()

On Tue, Apr 05, 2016 at 05:59:04PM +0200, Sebastian Andrzej Siewior wrote:
> If we error out in __cpu_disable() (via takedown_cpu() which is
> currently the last one that can fail) we don't rollback entirely to
> CPUHP_ONLINE (where we started) but to CPUHP_AP_ONLINE_IDLE. This
> happens because the former states were on the target CPU (the AP states)
> and during the rollback we go back until the first BP state we started.
> During the next cpu_down attempt (on the same failed CPU) will take
> forever because the cpuhp thread is still down.
> 
> The fix this I rollback to where we started in _cpu_down() via a workqueue
> to ensure that those callback will be run on the target CPU in
> non-atomic context (as in normal cpu_up()).
> The workqueues should be working again because the CPU_DOWN_FAILED were
> already invoked.
> 
> notify_online() has been marked as ->skip_onerr because otherwise we
> will see the CPU_ONLINE notifier in addition to the CPU_DOWN_FAILED.
> However with ->skip_onerr we neither see CPU_ONLINE nor CPU_DOWN_FAILED
> if something in between (CPU_DOWN_FAILED … CPUHP_TEARDOWN_CPU).
> Currently there is nothing.
> 
> This regression got probably introduce in the rework while we introduced
> the hotplug thread to offload the work to the target CPU.
> 
> Fixes: 4cb28ced23c4 ("cpu/hotplug: Create hotplug threads")
> Reported-by: Heiko Carstens <heiko.carstens@...ibm.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
>  kernel/cpu.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)

This fixes the issue that a second cpu_down() will take forever, if
__cpu_disable() fails.

However it does not fix the issue that CPU_DOWN_FAILED will be seen on a
different cpu than the cpu that was supposed to be taken offline.