[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3958637-a03d-f899-63ef-8d0d50e78a1e@yandex-team.ru>
Date: Mon, 15 Jul 2019 10:38:07 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Konstantin Khlebnikov <koct9i@...il.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Petr Mladek <pmladek@...e.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] kernel/printk: prevent deadlock at calling kmsg_dump from
NMI context
On 15.07.2019 5:33, Sergey Senozhatsky wrote:
> On (07/13/19 17:03), Konstantin Khlebnikov wrote:
>>> We call kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() and after
>>> printk_safe_flush_on_panic(). printk_safe_flush_on_panic() resets
>>> the state of logbuf_lock, so logbuf_lock, in general case, should
>>> be unlocked by the time we call kmsg_dump(KMSG_DUMP_PANIC).
>>> Even for nested contexts.
>>>
>>> CPU0
>>> printk()
>>> logbuf_lock_irqsave(flags)
>>> -> NMI
>>> panic()
>>> smp_send_stop()
>>> printk_safe_flush_on_panic()
>>> raw_spin_lock_init(&logbuf_lock) << reinit >>
>>> kmsg_dump(KMSG_DUMP_PANIC)
>>> logbuf_lock_irqsave(flags) << expected to be OK >>
>>>
>>> So do we have strong reasons to disable NMI->panic->kmsg_dump(DUMP_PANIC)?
>>>
>>> Other kmsg_dump(), maybe, can experience some troubles sometimes,
>>> need to check that.
>>
>> Indeed, panic is especially handled and looks fine.
>>
>> Sanity check in my patch could be relaxed:
>>
>> if (WARN_ON_ONCE(reason != KMSG_DUMP_PANIC && in_nmi()))
>> return;
>
> How critical kmsg_dump() is? We deadlock only if NMI->kmsg_dump()
> happens on the CPU which already holds the logbuf_lock; in any
> other case logbuf_lock is owned by another CPU which is expected
> to unlock it eventually. So it doesn't look like disabling all
> NMI->kmsg_dump() is the only way to fix it.
>
> When we lock logbuf_lock we increment per-CPU printk_context
> (PRINTK_SAFE_CONTEXT_MASK bits); when we unlock logbuf_lock
> we decrement printk_context. Thus we always can tell if the
> logbuf_lock was locked on the very same CPU - this_cpu printk_context
> has PRINTK_SAFE_CONTEXT_MASK bits sets - and we are about to deadlock
> in a nested context (NMI), or the lock was locked on another CPU and
> it's "safe" to spin on logbuf_lock and wait for logbuf_lock to become
> available.
I see no users who dumps dmesg from NMI context except panic.
This shouldn't happen. But might happen is something goes very wrong.
So, this trickery is not required.
Also spinning in NMI for handling non-fatal cases is a bad idea.
Powered by blists - more mailing lists