lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZnUydTAC9lbTvF9H@pathway.suse.cz>
Date: Fri, 21 Jun 2024 09:57:41 +0200
From: Petr Mladek <pmladek@...e.com>
To: Andrew Halaney <ahalaney@...hat.com>
Cc: John Ogness <john.ogness@...utronix.de>,
	Derek Barbosa <debarbos@...hat.com>, rostedt@...dmis.org,
	senozhatsky@...omium.org, linux-rt-users@...r.kernel.org,
	linux-kernel@...r.kernel.org, williams@...hat.com,
	jlelli@...hat.com, lgoncalv@...hat.com, jwyatt@...hat.com,
	aubaker@...hat.com
Subject: Re: [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck for
 78s

On Thu 2024-06-20 12:27:12, Andrew Halaney wrote:
> On Wed, Jun 19, 2024 at 11:46:38AM GMT, Petr Mladek wrote:
> > On Tue 2024-06-18 17:52:19, Andrew Halaney wrote:
> > > On Tue, Jun 18, 2024 at 09:03:00PM GMT, John Ogness wrote:
> > > Just in case I did something dumb, here's the module I wrote up:
> > > 
> > > ahalaney@...en2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % cat kernel/printk/test_thread.c                         :(
> > > /*
> > >  * Test making a kthread similar to nbcon's (under load)
> > >  * to see if it also has issues with migrate_swap()
> > >  */
> > > #include "linux/nmi.h"
> > > #include <asm-generic/delay.h>
> > > #include <linux/kthread.h>
> > > #include <linux/module.h>
> > > #include <linux/sched.h>
> > > 
> > > DEFINE_STATIC_SRCU(test_srcu);
> > > static DEFINE_SPINLOCK(test_lock);
> > > static struct task_struct *kt;
> > > static bool dont_stop = true;
> > > 
> > > static int test_thread_func(void *unused) {
> > > 	unsigned long flags;
> > > 
> > > 	pr_info("Starting the while true loop\n");
> > > 	do {
> > > 		int cookie = srcu_read_lock_nmisafe(&test_srcu);
> > > 		spin_lock_irqsave(&test_lock, flags);
> > > 		touch_nmi_watchdog();
> > > 		udelay(5000);  // print a line to serial
> > > 		spin_unlock_irqrestore(&test_lock, flags);
> > > 		srcu_read_unlock_nmisafe(&test_srcu, cookie);
> > 
> > Does it help to add here?
> > 
> > 		cond_resched();
> > 
> > > 	} while (dont_stop);
> > > 
> > > 	return 0;
> > > }
> > > 
> > > static int __init test_thread_init(void) {
> > > 
> > > 	pr_info("Creating test_thread at -20 nice level\n");
> > > 	kt = kthread_run(test_thread_func, NULL, "test_thread");
> > > 	if (IS_ERR(kt)) {
> > > 		pr_err("Failed to make test_thread\n");
> > > 		return PTR_ERR(kt);
> > > 	}
> > > 	sched_set_normal(kt, -20);
> > > 
> > > 	return 0;
> > > }
> > > 
> > > static void __exit test_thread_exit(void) {
> > > 	dont_stop = false;
> > > 	kthread_stop(kt);
> > > }
> > > 
> > > module_init(test_thread_init);
> > > module_exit(test_thread_exit);
> > > MODULE_LICENSE("GPL");
> > 
> > The touch_nmi_watchdog() caused that watchdog_timer_fn() did not see
> > that "test_thread" kthread did not schedule. By other words, it did
> > hide the problem.
> > 
> 
> Is it reasonable to consider removing the touch_nmi_watchdog()'s in
> 8250_port.c? There's some rather old ones, and some new ones with the
> nbcon transition, and they sort of made finding this issue more
> indirect.
> 
> Could be some valid reason they exist still, but to me it seems sensible
> to remove if we can't think of any good reasons.

Good point!

I believe that they were added because of flushing printk() messages.

This is the case of commit 54f19b4a6791491 ("tty/serial/8250: Touch
NMI watchdog in  wait_for_xmitr"), definitely. The others were added
before git history so that it is more complicated to check it.

Anyway, I think that it is not necessary to touch the watchdog on
every operation on the serial console. It should be enough to
touch them only around writing single printk record/message.
And it is better to do so in the generic printk cycle than
in particular console drivers.

Well, we need to make sure that the watchdog is touched in all
cycles flushing consoles, like console_flush_all() or
__nbcon_atomic_flush_pending_con().

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ