[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YGXV8LJarjUJDhvy@alley>
Date: Thu, 1 Apr 2021 16:17:20 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org,
Michael Ellerman <mpe@...erman.id.au>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Eric Biederman <ebiederm@...ssion.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Nicholas Piggin <npiggin@...il.com>,
Alistair Popple <alistair@...ple.id.au>,
Jordan Niethe <jniethe5@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Kees Cook <keescook@...omium.org>,
Tiezhu Yang <yangtiezhu@...ngson.cn>,
Alexey Kardashevskiy <aik@...abs.ru>,
Yue Hu <huyue2@...ong.com>, Rafael Aquini <aquini@...hat.com>,
"Guilherme G. Piccoli" <gpiccoli@...onical.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
linuxppc-dev@...ts.ozlabs.org, kexec@...ts.infradead.org
Subject: Re: [PATCH printk v2 2/5] printk: remove safe buffers
On Thu 2021-04-01 15:19:52, John Ogness wrote:
> On 2021-04-01, Petr Mladek <pmladek@...e.com> wrote:
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -1142,24 +1128,37 @@ void __init setup_log_buf(int early)
> >> new_descs, ilog2(new_descs_count),
> >> new_infos);
> >>
> >> - printk_safe_enter_irqsave(flags);
> >> + local_irq_save(flags);
> >
> > IMHO, we actually do not have to disable IRQ here. We already copy
> > messages that might appear in the small race window in NMI. It would
> > work the same way also for IRQ context.
>
> We do not have to, but why open up this window? We are still in early
> boot and interrupts have always been disabled here. I am not happy that
> this window even exists. I really prefer to keep it NMI-only.
Fair enough.
> >> --- a/lib/nmi_backtrace.c
> >> +++ b/lib/nmi_backtrace.c
> >> @@ -75,12 +75,6 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> >> touch_softlockup_watchdog();
> >> }
> >>
> >> - /*
> >> - * Force flush any remote buffers that might be stuck in IRQ context
> >> - * and therefore could not run their irq_work.
> >> - */
> >> - printk_safe_flush();
> >
> > Sigh, this reminds me that the nmi_safe buffers serialized backtraces
> > from all CPUs.
> >
> > I am afraid that we have to put back the spinlock into
> > nmi_cpu_backtrace().
>
> Please no. That spinlock is a disaster. It can cause deadlocks with
> other cpu-locks (such as in kdb)
Could you please explain more the kdb case?
I am curious what locks might depend on each other here.
> and it will cause a major problem for atomic consoles.
AFAIK, you are going to add a special lock that would allow
nesting on the same CPU. It should possible and safe
to use is also for synchronizing the backtraces here.
> We need to be very careful about introducing locks
> where NMIs are waiting on other CPUs.
I agree.
> > It has been repeatedly added and removed depending
> > on whether the backtrace was printed into the main log buffer
> > or into the per-CPU buffers. Last time it was removed by
> > the commit 03fc7f9c99c1e7ae2925d ("printk/nmi: Prevent deadlock
> > when accessing the main log buffer in NMI").
> >
> > It should be safe because there should not be any other locks in the
> > code path. Note that only one backtrace might be triggered at the same
> > time, see @backtrace_flag in nmi_trigger_cpumask_backtrace().
>
> It is adding a lock around a lockless ringbuffer. For me that is a step
> backwards.
>
> > We _must_ serialize it somehow[*]. The lock in nmi_cpu_backtrace()
> > looks less evil than the nmi_safe machinery. nmi_safe() shrinks
> > too long backtraces, lose timestamps, needs to be explicitely
> > flushed here and there, is a non-trivial code.
> >
> > [*] Non-serialized bactraces are real mess. Caller-id is visible
> > only on consoles or via syslogd interface. And it is not much
> > convenient.
>
> Caller-id solves this problem and is easy to sort for anyone with
> `grep'. Yes, it is a shame that `dmesg' does not show it, but directly
> using any of the printk interfaces does show it (kmsg_dump, /dev/kmsg,
> syslog, console).
True but frankly, the current situation is _far_ from convenient:
+ consoles do not show it by default
+ none userspace tool (dmesg, journalctl, crash) is able to show it
+ grep is a nightmare, especially if you have more than handful of CPUs
Yes, everything is solvable but not easily.
> > I get this with "echo l >/proc/sysrq-trigger" and this patchset:
>
> Of course. Without caller-id, it is a mess. But this has nothing to do
> with NMI. The same problem exists for WARN_ON() on multiple CPUs
> simultaneously. If the user is not using caller-id, they are
> lost. Caller-id is the current solution to the interlaced logs.
Sure. But in reality, the risk of mixed WARN_ONs is small. While
this patch makes backtraces from all CPUs always unusable without
caller_id and non-trivial effort.
> For the long term, we should introduce a printk-context API that allows
> callers to perfectly pack their multi-line output into a single
> entry. We discussed [0][1] this back in August 2020.
We need a "short" term solution. There are currently 3 solutions:
1. Keep nmi_safe() and all the hacks around.
2. Serialize nmi_cpu_backtrace() by a spin lock and later by
the special lock used also by atomic consoles.
3. Tell complaining people how to sort the messed logs.
My preference:
I most prefer 2nd solution until I see a realistic scenario
of a possible deadlock with the current kernel code.
I would still prefer 1st solution over 3rd one until we improve
kernel/userspace support for sorting the log by the caller id.
Best Regards,
Petr
Powered by blists - more mailing lists