linux-kernel - Re: [PATCH] kernel/printk: prevent deadlock at calling kmsg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <45b818b7-045d-2d65-327f-217b752accee@yandex-team.ru>
Date:   Mon, 15 Jul 2019 11:12:34 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Konstantin Khlebnikov <koct9i@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] kernel/printk: prevent deadlock at calling kmsg_dump from
 NMI context

On 15.07.2019 10:54, Petr Mladek wrote:
> On Mon 2019-07-15 11:33:38, Sergey Senozhatsky wrote:
>> On (07/13/19 17:03), Konstantin Khlebnikov wrote:
>>>> We call kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
>>>
>>> Indeed, panic is especially handled and looks fine.
> 
> Just to get a picture. What other situations are we talking about,
> please?
> 
> oops_exit() is one candidate. The other callers seem to be working
> in normal context.

Oops in NMI mostly. Also I've screwed up several times with NMI watchdog
and dumping log at setting taint.

> 
> 
>>> Sanity check in my patch could be relaxed:
>>>
>>>         if (WARN_ON_ONCE(reason != KMSG_DUMP_PANIC && in_nmi()))
>>>                 return;
>>
>> How critical kmsg_dump() is? We deadlock only if NMI->kmsg_dump()
>> happens on the CPU which already holds the logbuf_lock; in any
>> other case logbuf_lock is owned by another CPU which is expected
>> to unlock it eventually. So it doesn't look like disabling all
>> NMI->kmsg_dump() is the only way to fix it.
>>
>> When we lock logbuf_lock we increment per-CPU printk_context
>> (PRINTK_SAFE_CONTEXT_MASK bits); when we unlock logbuf_lock
>> we decrement printk_context. Thus we always can tell if the
>> logbuf_lock was locked on the very same CPU - this_cpu printk_context
>> has PRINTK_SAFE_CONTEXT_MASK bits sets - and we are about to deadlock
>> in a nested context (NMI), or the lock was locked on another CPU and
>> it's "safe" to spin on logbuf_lock and wait for logbuf_lock to become
>> available.
> 
> This sounds familiar. I think that we did not consider it safe in the
> end, see the commit 03fc7f9c99c1e7ae29 ("printk/nmi: Prevent deadlock
> when accessing the main log buffer in NMI").
> 
> If the problem is only with Oops then the 2nd propose might be
> acceptable. The system will either try to continue or it will end
> up in panic() anyway.
> 
> Well, WARN() looks like an overkill, especially if there is only
> one possible caller that prints the stack anyway. I would personally
> do not print any message and do just:
> 
> 	/*
> 	 * Prevent deadlock on logbuf_lock. It is safe only
> 	 * in panic() after smp_send_stop() and resetting
> 	 * the lock.
> 	 */
> 	if (in_nmi() && reason != KMSG_DUMP_PANIC)
> 		return;
> 

WARN_ON_ONCE is useful timesaver in debugging.
It's better to know when happens something that shouldn't.

> Best Regards,
> Petr
>