linux-kernel - Re: [RFC PATCH] panic: fix deadlock in panic()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200604075922.GA143696@jagdpanzerIV.localdomain>
Date:   Thu, 4 Jun 2020 16:59:22 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To:     Cheng Jian <cj.chengjian@...wei.com>
Cc:     linux-kernel@...r.kernel.org, chenwandun@...wei.com,
        xiexiuqi@...wei.com, bobo.shaobowang@...wei.com,
        huawei.libin@...wei.com, pmladek@...e.com,
        sergey.senozhatsky@...il.com, rostedt@...dmis.org
Subject: Re: [RFC PATCH] panic: fix deadlock in panic()

On (20/06/03 14:19), Cheng Jian wrote:
>  A deadlock caused by logbuf_lock occurs when panic:
> 
> 	a) Panic CPU is running in non-NMI context
> 	b) Panic CPU sends out shutdown IPI via NMI vector
> 	c) One of the CPUs that we bring down via NMI vector holded logbuf_lock
> 	d) Panic CPU try to hold logbuf_lock, then deadlock occurs.
> 
> we try to re-init the logbuf_lock in printk_safe_flush_on_panic()
> to avoid deadlock, but it does not work here, because :
> 
> Firstly, it is inappropriate to check num_online_cpus() here.
> When the CPU bring down via NMI vector, the panic CPU willn't
> wait too long for other cores to stop, so when this problem
> occurs, num_online_cpus() may be greater than 1.
> 
> Secondly, printk_safe_flush_on_panic() is called after panic
> notifier callback, so if printk() is called in panic notifier
> callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops
> is set, we print some debug information, it will try to hold the
> logbuf_lock.
> 
> To avoid this deadlock, drop the num_online_cpus() check and call
> the printk_safe_flush_on_panic() before panic_notifier_list callback,
> attempt to re-init logbuf_lock from panic CPU.

We hopefully will get rid of some of these locks (around 5.9 kernel
maybe), so the deadlocks (at least in the printk-code) should become
less common.

	-ss