[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9531fc38cfebb5b4587967f6ec73d983fd9325ce.camel@gmail.com>
Date: Sun, 30 Apr 2023 19:00:42 +0200
From: calumlikesapplepie@...il.com
To: Petr Mladek <pmladek@...e.com>, Chris Down <chris@...isdown.name>
Cc: linux-kernel@...r.kernel.org,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
John Ogness <john.ogness@...utronix.de>,
Geert Uytterhoeven <geert@...ux-m68k.org>, kernel-team@...com,
mj@...ey.karlin.mff.cuni.cz
Subject: Re: [PATCH v5 2/2] printk: console: Remove sysrq exception
Hullo! I'm a random college student; this is my first message to the
LKML, please don't be too mean!
On Thu, 2023-04-27 at 15:49 +0200, Petr Mladek wrote:
> On Tue 2023-04-25 14:06:26, Chris Down wrote:
> > Consoles can have vastly different latencies and throughputs. For
> > example, writing a message to the serial console can take on the order
> > of tens of milliseconds to get the UART to successfully write a message.
> > While this might be fine for a single, one-off message, this can cause
> > significant application-level stalls in situations where the kernel
> > writes large amounts of information to the console.
>
> > This means that while you might want to send at least INFO level
> > messages to (for example) netconsole, which is relatively fast, you may
> > only want to send at least WARN level messages to the serial console.
> > Such an implementation would permit debugging using the serial console
> > in cases that netconsole doesn't receive messages during particularly
> > bad system issues, while still keeping the noise low enough to avoid
> > inducing latency in userspace applications. This patch adds such an
> > interface, extending the existing console loglevel controls to allow
> > each console to have its own loglevel.
What I'm coming here to say isn't really perfectly relevant to this
goal, but it is related to the work you're doing to accomplish it.
Basically: sysrq interaction with the console is terrible, and is
causing problems for no good reasons as a result of being an edge case,
making me sad. Details on why I am sad are in the next two paragraphs,
but you can just skip them. In fact, I recommend it.
Yes, this email is the result of about two and a half hours of
dedicated procrastination right at the start of finals week. I blame
Steam for eating up all my RAM and swap to pre-cache Vulkan shaders,
whatever error handler froze my desktop after it did, kdump for
generating a nice crash report revealing that the sysrq's were
processed but that the OOM Killer was never invoked (despite a kworker
allocation failure), atopd for creating a log entry and proving the
system was alive and well after the freeze, and systemd-journald for
consuming 15% cpu in that entry but not actually syncing anything to
disk.
I suppose my decision to get said shaders ready for next week's jubilee
might be partly to blame. Or my habit of running two browsers with a
few hundred tabs apeice, a CAD program, and a half-dozen other things
at the same time. My assumption that sysrq headers print
unconditionally on all consoles means that they also probably count as
KERN_CRIT messages (which trigger an immediate journal sync) might not
have helped. I suppose if I'd been more patient my system might have
either recovered or finally gotten the OOM killer out, and been
responsive. And maybe if I'd known more sysrq's than 'emergency sync
to disk' and 'crash the kernel', or had chosen to trigger a mass SIGINT
rather than a crash, I wouldn't have embarked on this journey.
But really, I blame 1993 Linus Torvalids for coding up the sysrq
loglevel dance in the first place.
[snip]
> > --- a/drivers/tty/sysrq.c
> > +++ b/drivers/tty/sysrq.c
> > @@ -101,12 +102,26 @@ __setup("sysrq_always_enabled",
> > sysrq_always_enabled_setup);
> > static void sysrq_handle_loglevel(int key)
> > {
> > int i;
> > + int cookie;
> > + int warned = 0;
> > + struct console *con;
> >
> > i = key - '0';
> > console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
> > pr_info("Loglevel set to %d\n", i);
> > console_loglevel = i;
> > +
> > + cookie = console_srcu_read_lock();
> > + for_each_console_srcu(con) {
> > + if (!warned && per_console_loglevel_is_set(con)) {
> > + warned = 1;
> > + pr_warn("Overriding per-console loglevel
> > from sysrq\n");
> > + }
> > + con->level = -1;
>
> Please, use WRITE_ONCE(con->level, -1) to make sure that it will be
> atomic.
>
> READ_ONCE()/WRITE_ONCE() should always be used when the values
> are read/written using RCU synchronization. Otherwise the compiler
> might do some optimizations and read/write bytes separately.
>
I would argue we remove this whole logic, and just go to a nice and
simple printk at KERN_EMERG priority. Why add interactions with the
SRCU subsystem in code that we need to run even in the event of some
massive SRCU bug? This code will be both infrequently tested and make
a bad day worse if it's problematic.
> > + }
> > + console_srcu_read_unlock(cookie);
> > }
>
> Also we should safe/set/restore "ignore_per_console_loglevel"
> in __handle_sysrq(). It already does the same with
> "console_loglevel".
>
> __handle_sysrq() increases the loglevel to show all
> messages printed by the sysrq handler on all consoles.
> Many handlers print some information that might be useful
> for debugging.
Actually, it doesn't! Be kinda cool if it did, but there is a
legitimate reason for it not to: some sysrq messages, like the list of
all processes in the system, are extremely long, and so you might want
to print them to a file but not every console. Instead, it prints
exactly one message with the increased log level visibility: an info-
priority message that states the "action message" of the sysrq, before
returning to the default log level.
In other words: magic sysrq messages are printed to every console,
because they are deemed to be important emergencies that no user should
miss, even if they told the kernel it better be actively on fire if it
wants to talk. Cool. However, they are not marked as emergency
priority. Instead, they're marked as info priority, but do a dance
with kernel log level to ensure they get printed everywhere.
Why? See this thread, where they discuss adding the (kind-of vague)
comment that now exists in the code to indicate the reasoning.
https://lore.kernel.org/lkml/20090107123725.GC2520@shadowen.org/
Okay, so that explains why we do the song-and-dance only for the header
message. Still wondering why the printk at low priority? They
acknowledge it as something weird, but don't talk about why. So I dove
into git history, only to find that the log-level shifting code was
added with the initial import from bitkeeper. And then that, while
there was a change to the file in the bitkeeper history, the log level
shifting code predates it.
Uh oh.
So lets go back to May 31st, 1997, when the version of sysrq.c that
makes it into the 2.1.43 kernel is written by Martin Mares. Fun fact:
the email address he used then probably still works; I've added him to
the CC on this, since it was him who created the code. And... yup.
The log level shifting code is there, complete with the same variable
names. Sysrq.c does not exist in the previous kernel version.
https://mirrors.edge.kernel.org/pub/linux/kernel/v2.1/linux-2.1.43.tar.gz
No, I could not find the email where this patch was first submitted.
It may not exist on the internet; it certainly doesn't in the parts
where Google looks.
Now, printk did support priority levels at this point: without the
emails discussing the patch, therefore, it's impossible to say why
Martin decided all those years ago to use this technique. It's
probably a moot point anyways.
The fact that sysrq messages behave in this way isn't well documented,
and is definitely harmful. Users might (rightfully) assume that if
they set the log level to 1, they will not see any messages that are
not of EMERGENCY or ALERT priority. If they program on that
assumption, however, a sysrq might break their code: it will be shown
on their console, but break whatever boolean they have looking at
priority. The chance of that occurring will go up with these changes.
sysrq's behaving in this way, as info-level emergencies, is just weird.
No sane system uses a sysrq in normal running: its an awkward
interface, designed for use by a human, with pretty much all of the
keys causing some level of system-level disruption. The stated reason
doesn't really make sense; why is it important enough to be shown on
every console, but not important enough to be seen as an emergency by
the computer?
If your system experiences a sysrq, either you have some weird backup
software that is using the wrong interface, or someone with extremely
privileged access to your system believes that there is something so
fundamentally wrong with your system that they need to bypass the
entirety of userspace and much of the kernel to get something done.
Either of those situations are at least as important as a typo in a
password for sudo; which is given a CRITICAL priority.
Lets not add a pile of code in order to maintain a behavior that no
sane userspace will be depending on, and which might even be causing
bugs in sane userspaces. Like, for instance, systemd-journald deciding
not to write out journals when I instruct my kernel to do an emergency
sync.
>
> The following separate patches comes to my mind:
>
> 1. Fix printk_delay() call (already done, current 1st patch)
>
> 2. Pass @con instead of @is_extended/@..._schedule in
> in printk_get_next_message().
>
> 3. Add con->level and the logic that will use it in printk(),
> including console_effective_loglevel(). It will be
> temporary initialized to -1 in register_console().
>
> 4. Add ignore_per_console_loglevel parameter, use it
> in per_console_loglevel_is_set(), do_syslog(),
> and __handle_sysrq().
In other words: sysrq's use of the printk subsystem in this way is
unique, and thus almost certainly a bad idea.
> 5. Add sysfs interface that would allow to set per-console
> log level at runtime.
>
> 6. Add the command line interface that would allow to define
> the console loglevel via console= parameter.
>
> 7. Update the sysctl interface.
>
Powered by blists - more mailing lists