linux-kernel - Re: [PATCH] kernel/printk: prevent deadlock at calling kmsg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190715075444.fzxnf3iduqj6dkgp@pathway.suse.cz>
Date:   Mon, 15 Jul 2019 09:54:44 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Konstantin Khlebnikov <koct9i@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
Subject: Re: [PATCH] kernel/printk: prevent deadlock at calling kmsg_dump
 from NMI context

On Mon 2019-07-15 11:33:38, Sergey Senozhatsky wrote:
> On (07/13/19 17:03), Konstantin Khlebnikov wrote:
> > > We call kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
> > 
> > Indeed, panic is especially handled and looks fine.

Just to get a picture. What other situations are we talking about,
please?

oops_exit() is one candidate. The other callers seem to be working
in normal context.


> > Sanity check in my patch could be relaxed:
> > 
> >        if (WARN_ON_ONCE(reason != KMSG_DUMP_PANIC && in_nmi()))
> >                return;
> 
> How critical kmsg_dump() is? We deadlock only if NMI->kmsg_dump()
> happens on the CPU which already holds the logbuf_lock; in any
> other case logbuf_lock is owned by another CPU which is expected
> to unlock it eventually. So it doesn't look like disabling all
> NMI->kmsg_dump() is the only way to fix it.
>
> When we lock logbuf_lock we increment per-CPU printk_context
> (PRINTK_SAFE_CONTEXT_MASK bits); when we unlock logbuf_lock
> we decrement printk_context. Thus we always can tell if the
> logbuf_lock was locked on the very same CPU - this_cpu printk_context
> has PRINTK_SAFE_CONTEXT_MASK bits sets - and we are about to deadlock
> in a nested context (NMI), or the lock was locked on another CPU and
> it's "safe" to spin on logbuf_lock and wait for logbuf_lock to become
> available.

This sounds familiar. I think that we did not consider it safe in the
end, see the commit 03fc7f9c99c1e7ae29 ("printk/nmi: Prevent deadlock
when accessing the main log buffer in NMI").

If the problem is only with Oops then the 2nd propose might be
acceptable. The system will either try to continue or it will end
up in panic() anyway.

Well, WARN() looks like an overkill, especially if there is only
one possible caller that prints the stack anyway. I would personally
do not print any message and do just:

	/*
	 * Prevent deadlock on logbuf_lock. It is safe only
	 * in panic() after smp_send_stop() and resetting
	 * the lock.
	 */
	if (in_nmi() && reason != KMSG_DUMP_PANIC)
		return;

Best Regards,
Petr