linux-kernel - Re: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180530083204.m2yvmm7mc6owvpdk@pathway.suse.cz>
Date:   Wed, 30 May 2018 10:32:04 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:     Hoeun Ryu <hoeun.ryu@....com.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Hoeun Ryu <hoeun.ryu@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH]  printk: make printk_safe_flush safe in NMI context by
 skipping flushing

On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> On (05/29/18 11:51), Hoeun Ryu wrote:
> >  Make printk_safe_flush() safe in NMI context.
> > nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
> > function is called in watchdog_overflow_callback() if the flag of hardlockup
> > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > watchdog_overflow_callback() function is called in NMI context on some
> > architectures.
> >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
> > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in
> > preempted contexts (task or irq in this case) or by other CPUs and it may cause

The sentence "logbuf_lock can be already locked in preempted contexts" does not
make much sense. It is a spin lock. It means that both interrupts and
preemption are disabled.

I would change it to something like:

"Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
to lock logbuf_lock in vprintk_emit() that might be already be part
of a soft- or hard-lockup on another CPU."

> > deadlocks.
> >  By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
> > just skips flushing if the lock is not avaiable in NMI context. The messages in
> > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
> > hard lockup (because irq is disabled here) but if panic() is not called. The
> > flushing can be delayed by the next irq work in normal cases.

I somehow miss there a motivation why the current state is better than
the previous. It looks like we exchange the risk of a deadlock with
a risk of loosing the messages.

I see it the following way:

"This patch prevents a deadlock in printk_safe_flush() in NMI
context. It makes sure that we continue and eventually call
printk_safe_flush_on_panic() from panic() that has better
chances to succeed.

There is a risk that logbuf_lock was not part of a soft- or
dead-lockup and we might just loose the messages. But then there is a high
chance that irq_work will get called and the messages will get flushed
the normal way."

> Any chance we can add more info to the commit message? E.g. backtraces
> which would describe "how" is this possible (like the one I posted in
> another email). Just to make it more clear.

I agree that a backtrace would be helpful. But it is not a must to
have from my point of view.

The patch itself looks good to me.

Best Regards,
Petr