lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 5 May 2023 12:39:36 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     calumlikesapplepie@...il.com
Cc:     Chris Down <chris@...isdown.name>, linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        John Ogness <john.ogness@...utronix.de>,
        Geert Uytterhoeven <geert@...ux-m68k.org>, kernel-team@...com,
        mj@...ey.karlin.mff.cuni.cz

Bcc: 
Subject: Re: [PATCH v5 2/2] printk: console: Remove sysrq exception
Reply-To: 
In-Reply-To: <bad7cc32729c153689a32ac5111eb2b7882963a6.camel@...il.com>

Put back people and mailing lists into Cc.

On Tue 2023-05-02 21:37:52, calumlikesapplepie@...il.com wrote:
> On Tue, 2023-05-02 at 12:27 +0200, Petr Mladek wrote:
> > On Sun 2023-04-30 19:00:42, calumlikesapplepie@...il.com wrote:
> > It seems that the original code actually printed all messages with
> > the updated console_loglevel.
> > 
> > It was modified to printk only the first line or help by:
> > commit 2433aae9cbfbe77b5c5af11e6174d390e06053a6
> > Author: linus1 <torvalds@...lon.transmeta.com>
> > 
> >     v2.4.10.1 -> v2.4.10.2
> > 
> > , see
> > https://github.com/mpe/linux-fullhistory/commit/2433aae9cbfbe77b5c5af11e6174d390e06053a6
> > 
> > I do not see any explanation for this. I guess that the too long
> > output caused some problems.
> > 
> > IMHO, the debug output is not printed because it might be too
> > much for slow consoles. It might actually be an advantage to
> > distinguish the log level of the various messages. It allows
> > to filter the messages on the console.
> > 
> > sysrq does not know in which state the system is. It might be
> > called even on a normally running system. Will the EMERGENCY
> > or ALERT be correct in that case?
> 
> Based on the definitions laid out in /include/linux/kern_levels, I
> think the fact that a sysrq has been triggered is worthy of CRITICAL
> status, perhaps even ALERT.  Every example use case for a sysrq is of a
> system that is not running normally; while you can trigger it via
> software (/proc/sysrq-trigger ), that isn't really the intended use.
> 
> This is a system that is explicitly designed to be used for manual
> interaction with the lowest levels of the operating system by a
> superuser.  Most alerts are to get an admin's attention for a problem;
> when a sysrq occurs, the admin is paying attention and is trying to
> solve a problem.  In short, a critical situation already exists.

If admins trigger sysrq then they do not need to be alerted by
a console loglevel.

IMHO, the main motivation for the console_loglevel change was
to give the admin feedback that the system was living enough
to start processing the sysrq.

Anyway, I do not want to continue long philosophical discussions
about motivations and expectations. The most important thing
is how to solve your problem and do not break others.

> > > If your system experiences a sysrq, either you have some weird backup
> > > software that is using the wrong interface
> > 
> > Is there any backup system doing this? Or is it just some wild theory?
> 
> I wish it were a wild theory, but there does seem to be a system doing
> this.  Someone should probably ask them to stop.  Maybe the backup
> system is a custom design for this one company's servers, but it does
> exist.
> 
> https://support.binarylane.com.au/support/solutions/articles/11000107835-what-is-sysrq-emergency-sync
> 
> > > , or someone with extremely
> > > privileged access to your system believes that there is something so
> > > fundamentally wrong with your system that they need to bypass the
> > > entirety of userspace and much of the kernel to get something done.
> > 
> > My understanding is that sysrq is primary used when userspace
> > does not longer work. IMHO, the original use-case was to
> > trigger it from the keyboard.
> > 
> 
> Yup.  However, it can also be triggered by writing to procfs as a
> superuser.
> 
> > > Either of those situations are at least as important as a typo in a
> > > password for sudo; which is given a CRITICAL priority.  
> > > 
> > > Lets not add a pile of code in order to maintain a behavior that no
> > > sane userspace will be depending on, and which might even be causing
> > > bugs in sane userspaces.  Like, for instance, systemd-journald deciding
> > > not to write out journals when I instruct my kernel to do an emergency
> > > sync.
> > 
> > Honestly, I am not sure what would be your preferred behavior.
> > It might be because I am not a native speaker. And the mail is
> > really long.
> 
> it really is overly long, yeah.
> 
> > Is the problem that systemd-journald did not write the log?
> > Or is the problem that it did eat 15% CPU?
> 
> That the log was not written out to disk immediately, despite a
> critical situation existing.  However, this isn't a journald bug. 
> Journald is documented as being perfectly happy to cache logs for up to
> 5 minutes by default, unless a sufficiently high priority message is
> received.  

Will it actually help to print the initial sysrq message with
higher priority? Could you try it, please?

> > Eating 15% CPU looks like a bug. The fact that it did not write
> > anything might be because of the OOM situation. Most things get
> > blocked when there is no memory.

> In my experience, journald eating 15% of CPU when things are going boom
> is perfectly reasonable.  The OOM situation would be causing many
> failing allocations in userspace, which journald would be trying to
> process.  

journald might actually have problems to store new messages even
in memory unless it has pre-allocated a big enough buffer.

Another question is how quickly it could store the huge amount
of data on disk when many operations are blocked or slowed down
by the OOM situation.

> I initially didn't think it failed to write due to the OOM situation; I
> left the system sitting for five minutes, and several other operations
> happily completed (the atop log was written, for instance).  However,
> looking closer at the log, its clear that of that high CPU usage,
> almost none of it went to userspace: in fact, atop shows 0 seconds of
> USRCPU and 87.12 seconds of SYSCPU.  Atop has a far smaller virtual
> memory usage, and required less resident memory: thus, it seems likely
> it was able to squeeze in work where journalctl failed.
> 
> I suppose that, ultimately, the best solution to OOM causing problems
> is to avoid it occurring; I've installed systemd-oomd, to avoid this
> precise problem.  I'm sure there's some optimization to be made when
> the kernel starts thrashing swap: but that's all out of scope.  Even if
> this doesn't actually solve my specific problem, it will remove what I
> see as a wart.
> 
> > What exact sysrq behavior would you suggest, please?
> 
> Do not adjust console_loglevel to print out the single sysrq header
> line.  Instead, print that line out at an appropriate priority
> (KERN_CRIT, perhaps).  Print the remaining lines at their current
> priorities.

Could you please try if it really helps in your case?

> While I mentioned emergency priority previously, I did not realize that
> it could result in the kernel dumping immediately. 
> 
> At the very least, we should increase the priority of that printline,
> so it is less out of place.
> 
> > > > 4. Add ignore_per_console_loglevel parameter, use it
> > > >    in per_console_loglevel_is_set(), do_syslog(),
> > > >    and __handle_sysrq().
> > > 
> > > In other words: sysrq's use of the printk subsystem in this way is
> > > unique, and thus almost certainly a bad idea.
> > 
> > sysrq is very old interface. Various people might expect different
> > behavior depending on the use case. It might be impossible to
> > make all people happy.
> > 
> > Changing the default behavior a significant way might be seen as a regression.
> > Especially, printing all messages with EMERGENCY loglevel looks like
> > a pretty bad idea because it would prevent any filtering on the
> > console level.
> 
> That's true: however, its also an interface that is intentionally
> awkward and limiting to use.  The only sysrqs that even HAVE a use when
> a system is in normal operation (ie, not being played with by a kernel
> developer or actively freezing up) are those that change log levels,
> and the sync filesystem call.  All the others, if called on a system
> you are actually trying to do work with, will result in either
> meaningless debug information dumps more easily acquired through other
> means, or will actively disrupt whatever you're trying to do.
> 
> > My feeling is that your primary problem is somewhere else,
> > systemd-journald or OOM behavior.
> 
> Problably.

Let's first try if changing the loglevel actually helps.

Anyway, I suggest to avoid OOM in the first place. Maybe,
use cgroups for limiting the most hungry processes so that
they do not break the system.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ