linux-kernel - Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YZIx+ZBEKnVHSnbO@alley>
Date:   Mon, 15 Nov 2021 11:10:01 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     Wander Costa <wcosta@...hat.com>
Cc:     Wander Lairson Costa <wander@...hat.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        John Ogness <john.ogness@...utronix.de>,
        open list <linux-kernel@...r.kernel.org>,
        "Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by
 slow console devices

On Fri 2021-11-12 11:08:33, Wander Costa wrote:
> On Fri, Nov 12, 2021 at 5:45 AM Petr Mladek <pmladek@...e.com> wrote:
> > A workaround, is to lower console_loglevel and show only the most
> > important messages. Sometimes, a reasonable solution is to ratelimit
> > repeated messages.
> >
> > Which brings the question. What is the motivation for this patch,
> > please?
> >
> > Is it motivated by a particular bug report?
> > Or does the experience shows that this report causes more harm than
> > good?
> >
> QA has a test case in which they need to load hundreds of SCSI devices,
> and they simulate it using the scsi_debug driver:

I think that SCSI devices were the first sinner who motivated the work
on console offloading here at SUSE.

> modprobe scsi_debug virtual_gb=1 add_host=2 num_tgts=600
> 
> This dumps a bunch of messages to print and the serial console driver
> cannot keep up with the data rate, causing an RCU stall. The stall is reported
> in an IRQ context, then the ring buffer flush continues from there,
> and then it causes
> a soft lockup.

I usually suggest to reduce console_loglevel as a temporary solution.
But I am not sure if it is acceptable in QA.

It might be done only around this test. I mean something like:

CONSOLE_LOGLEVEL=`cat /proc/sys/kernel/printk`
IGNORE_LOGLEVEL=`cat /sys/module/printk/parameters/ignore_loglevel`
echo "3 4 1 7" >/proc/sys/kernel/printk
echo N >/sys/module/printk/parameters/ignore_loglevel

modprobe scsi_debug virtual_gb=1 add_host=2 num_tgts=600

echo $CONSOLE_LOGLEVEL >/proc/sys/kernel/printk
echo $IGNORE_LOGLEVEL  >/sys/module/printk/parameters/ignore_loglevel


Where /proc/sys/kernel/printk is a horrible interface. The first
number is important. It defines the limit used for filtering messages.
The levels are defined in include/linux/kern_levels.h.

Best Regards,
Petr