[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZnUydTAC9lbTvF9H@pathway.suse.cz>
Date: Fri, 21 Jun 2024 09:57:41 +0200
From: Petr Mladek <pmladek@...e.com>
To: Andrew Halaney <ahalaney@...hat.com>
Cc: John Ogness <john.ogness@...utronix.de>,
Derek Barbosa <debarbos@...hat.com>, rostedt@...dmis.org,
senozhatsky@...omium.org, linux-rt-users@...r.kernel.org,
linux-kernel@...r.kernel.org, williams@...hat.com,
jlelli@...hat.com, lgoncalv@...hat.com, jwyatt@...hat.com,
aubaker@...hat.com
Subject: Re: [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck for
78s
On Thu 2024-06-20 12:27:12, Andrew Halaney wrote:
> On Wed, Jun 19, 2024 at 11:46:38AM GMT, Petr Mladek wrote:
> > On Tue 2024-06-18 17:52:19, Andrew Halaney wrote:
> > > On Tue, Jun 18, 2024 at 09:03:00PM GMT, John Ogness wrote:
> > > Just in case I did something dumb, here's the module I wrote up:
> > >
> > > ahalaney@...en2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % cat kernel/printk/test_thread.c :(
> > > /*
> > > * Test making a kthread similar to nbcon's (under load)
> > > * to see if it also has issues with migrate_swap()
> > > */
> > > #include "linux/nmi.h"
> > > #include <asm-generic/delay.h>
> > > #include <linux/kthread.h>
> > > #include <linux/module.h>
> > > #include <linux/sched.h>
> > >
> > > DEFINE_STATIC_SRCU(test_srcu);
> > > static DEFINE_SPINLOCK(test_lock);
> > > static struct task_struct *kt;
> > > static bool dont_stop = true;
> > >
> > > static int test_thread_func(void *unused) {
> > > unsigned long flags;
> > >
> > > pr_info("Starting the while true loop\n");
> > > do {
> > > int cookie = srcu_read_lock_nmisafe(&test_srcu);
> > > spin_lock_irqsave(&test_lock, flags);
> > > touch_nmi_watchdog();
> > > udelay(5000); // print a line to serial
> > > spin_unlock_irqrestore(&test_lock, flags);
> > > srcu_read_unlock_nmisafe(&test_srcu, cookie);
> >
> > Does it help to add here?
> >
> > cond_resched();
> >
> > > } while (dont_stop);
> > >
> > > return 0;
> > > }
> > >
> > > static int __init test_thread_init(void) {
> > >
> > > pr_info("Creating test_thread at -20 nice level\n");
> > > kt = kthread_run(test_thread_func, NULL, "test_thread");
> > > if (IS_ERR(kt)) {
> > > pr_err("Failed to make test_thread\n");
> > > return PTR_ERR(kt);
> > > }
> > > sched_set_normal(kt, -20);
> > >
> > > return 0;
> > > }
> > >
> > > static void __exit test_thread_exit(void) {
> > > dont_stop = false;
> > > kthread_stop(kt);
> > > }
> > >
> > > module_init(test_thread_init);
> > > module_exit(test_thread_exit);
> > > MODULE_LICENSE("GPL");
> >
> > The touch_nmi_watchdog() caused that watchdog_timer_fn() did not see
> > that "test_thread" kthread did not schedule. By other words, it did
> > hide the problem.
> >
>
> Is it reasonable to consider removing the touch_nmi_watchdog()'s in
> 8250_port.c? There's some rather old ones, and some new ones with the
> nbcon transition, and they sort of made finding this issue more
> indirect.
>
> Could be some valid reason they exist still, but to me it seems sensible
> to remove if we can't think of any good reasons.
Good point!
I believe that they were added because of flushing printk() messages.
This is the case of commit 54f19b4a6791491 ("tty/serial/8250: Touch
NMI watchdog in wait_for_xmitr"), definitely. The others were added
before git history so that it is more complicated to check it.
Anyway, I think that it is not necessary to touch the watchdog on
every operation on the serial console. It should be enough to
touch them only around writing single printk record/message.
And it is better to do so in the generic printk cycle than
in particular console drivers.
Well, we need to make sure that the watchdog is touched in all
cycles flushing consoles, like console_flush_all() or
__nbcon_atomic_flush_pending_con().
Best Regards,
Petr
Powered by blists - more mailing lists