[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <45b818b7-045d-2d65-327f-217b752accee@yandex-team.ru>
Date: Mon, 15 Jul 2019 11:12:34 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc: Konstantin Khlebnikov <koct9i@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] kernel/printk: prevent deadlock at calling kmsg_dump from
NMI context
On 15.07.2019 10:54, Petr Mladek wrote:
> On Mon 2019-07-15 11:33:38, Sergey Senozhatsky wrote:
>> On (07/13/19 17:03), Konstantin Khlebnikov wrote:
>>>> We call kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
>>>
>>> Indeed, panic is especially handled and looks fine.
>
> Just to get a picture. What other situations are we talking about,
> please?
>
> oops_exit() is one candidate. The other callers seem to be working
> in normal context.
Oops in NMI mostly. Also I've screwed up several times with NMI watchdog
and dumping log at setting taint.
>
>
>>> Sanity check in my patch could be relaxed:
>>>
>>> if (WARN_ON_ONCE(reason != KMSG_DUMP_PANIC && in_nmi()))
>>> return;
>>
>> How critical kmsg_dump() is? We deadlock only if NMI->kmsg_dump()
>> happens on the CPU which already holds the logbuf_lock; in any
>> other case logbuf_lock is owned by another CPU which is expected
>> to unlock it eventually. So it doesn't look like disabling all
>> NMI->kmsg_dump() is the only way to fix it.
>>
>> When we lock logbuf_lock we increment per-CPU printk_context
>> (PRINTK_SAFE_CONTEXT_MASK bits); when we unlock logbuf_lock
>> we decrement printk_context. Thus we always can tell if the
>> logbuf_lock was locked on the very same CPU - this_cpu printk_context
>> has PRINTK_SAFE_CONTEXT_MASK bits sets - and we are about to deadlock
>> in a nested context (NMI), or the lock was locked on another CPU and
>> it's "safe" to spin on logbuf_lock and wait for logbuf_lock to become
>> available.
>
> This sounds familiar. I think that we did not consider it safe in the
> end, see the commit 03fc7f9c99c1e7ae29 ("printk/nmi: Prevent deadlock
> when accessing the main log buffer in NMI").
>
> If the problem is only with Oops then the 2nd propose might be
> acceptable. The system will either try to continue or it will end
> up in panic() anyway.
>
> Well, WARN() looks like an overkill, especially if there is only
> one possible caller that prints the stack anyway. I would personally
> do not print any message and do just:
>
> /*
> * Prevent deadlock on logbuf_lock. It is safe only
> * in panic() after smp_send_stop() and resetting
> * the lock.
> */
> if (in_nmi() && reason != KMSG_DUMP_PANIC)
> return;
>
WARN_ON_ONCE is useful timesaver in debugging.
It's better to know when happens something that shouldn't.
> Best Regards,
> Petr
>
Powered by blists - more mailing lists