[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiKTn-BMpp4w645XqmFBEtUXe84+TKc6aRMPbvFwUjA=A@mail.gmail.com>
Date: Thu, 8 Aug 2019 12:07:28 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: John Ogness <john.ogness@...utronix.de>
Cc: Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andrea Parri <andrea.parri@...rulasolutions.com>,
Thomas Gleixner <tglx@...utronix.de>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Brendan Higgins <brendanhiggins@...gle.com>
Subject: Re: [RFC PATCH v4 9/9] printk: use a new ringbuffer implementation
On Wed, Aug 7, 2019 at 3:27 PM John Ogness <john.ogness@...utronix.de> wrote:
>
> 2. For the CONFIG_PPC_POWERNV powerpc platform, kernel log buffer
> registration is no longer available because there is no longer
> a single contigous block of memory to represent all of the
> ringbuffer.
So this is tangential, but I've actually been wishing for a special
"raw dump" format that has absolutely *no* structure to it at all, and
is as a result not necessarily strictly reliable, but is a lot more
robust.
The background for that is that we have a class of bugs that are
really hard to debug "in the wild", because people don't have access
to serial consoles or any kind of special hardware at all (ie forget
things like nvram etc), and when the machine locks up you're happy to
just have a reset button (but more likely you have to turn power off
and on).
End result: a DRAM buffer can work, but is not "reliable".
Particularly if you turn power on and off, data retention of DRAM is
iffy. But it's possible, at least in theory.
So I have a patch that implements a "stupid ring buffer" for thisa
case, with absolutely zero data structures (because in the presense of
DRAM corruption, all you can get is "hopefully only slightly garbled
ASCII".
It actually does work. It's a complete hack, but I have used this on
real hardware to see dumps that happened after the machine could no
longer send them to any device.
I actually suspect that this kind of "stupid non-structured secondary
log" can often be much more useful than the existing nvram special
cases - yes the output can be garbled for multi-cpu cases because it
not only is lockless, it's lockess without even any data structures -
but it also works somewhat reliably when the machine is _really_
borked. Which is exactly when you want a log that isn't just the
normal "working machine syslog".
NOTE! This is *not* a replacement for a lockless printk. This is very
much an _additional_ "low overhead buffer in RAM" for post-mortem
analysis when anything fancier doesn't work.
So I'm throwing this patch out there in case people have interest in
looking at that very special case. Also note how right now the example
code just steals a random physical memory area at roughly physical
location 12GB - this is a hack and would need to be configurable
obviously in real life, but it worked for the machines I tested (which
both happened to have 16GB of RAM).
Those parts are marked with "// HACK HACK HACK" and just a hardcoded
physical address (0x320000000).
Linus
View attachment "0001-Trial-power-off-buffer-for-printk-data-retention.patch" of type "text/x-patch" (8286 bytes)
Powered by blists - more mailing lists