lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o77xa584.fsf@jogness.linutronix.de>
Date: Wed, 19 Jun 2024 07:15:31 +0206
From: John Ogness <john.ogness@...utronix.de>
To: Andrew Halaney <ahalaney@...hat.com>, tglx@...utronix.de
Cc: Derek Barbosa <debarbos@...hat.com>, pmladek@...e.com,
 rostedt@...dmis.org, senozhatsky@...omium.org,
 linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 williams@...hat.com, jlelli@...hat.com, lgoncalv@...hat.com,
 jwyatt@...hat.com, aubaker@...hat.com
Subject: Re: [BUG] printk/nbcon.c: watchdog BUG: softlockup - CPU#x stuck
 for 78s

[ Explicitly added tglx, hoping he can chime in here. ]

On 2024-06-18, Andrew Halaney <ahalaney@...hat.com> wrote:
>> Shouldn't the scheduler eventually kick the task off the CPU after
>> its timeslice is up?
>
> I trust you better than myself about this, but this is being
> reproduced with a CONFIG_PREEMPT_DYNAMIC=y +
> CONFIG_PREEMPT_VOLUNTARY=y setup (so essentially the current mode is
> VOLUNTARY). Does that actually work that way for a kthread in that
> mode?

It would be good not to trust me better than yourself. I actually have
very little experience with the non-RT preemption models. I will need to
investigate this further.

> Just in case I did something dumb, here's the module I wrote up:
>
> ahalaney@...en2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % cat kernel/printk/test_thread.c                         :(
> /*
>  * Test making a kthread similar to nbcon's (under load)
>  * to see if it also has issues with migrate_swap()
>  */
> #include "linux/nmi.h"
> #include <asm-generic/delay.h>
> #include <linux/kthread.h>
> #include <linux/module.h>
> #include <linux/sched.h>
>
> DEFINE_STATIC_SRCU(test_srcu);
> static DEFINE_SPINLOCK(test_lock);
> static struct task_struct *kt;
> static bool dont_stop = true;
>
> static int test_thread_func(void *unused) {
> 	unsigned long flags;
>
> 	pr_info("Starting the while true loop\n");
> 	do {
> 		int cookie = srcu_read_lock_nmisafe(&test_srcu);
> 		spin_lock_irqsave(&test_lock, flags);
> 		touch_nmi_watchdog();
> 		udelay(5000);  // print a line to serial
> 		spin_unlock_irqrestore(&test_lock, flags);
> 		srcu_read_unlock_nmisafe(&test_srcu, cookie);
> 	} while (dont_stop);
>
> 	return 0;
> }
>
> static int __init test_thread_init(void) {
>
> 	pr_info("Creating test_thread at -20 nice level\n");
> 	kt = kthread_run(test_thread_func, NULL, "test_thread");
> 	if (IS_ERR(kt)) {
> 		pr_err("Failed to make test_thread\n");
> 		return PTR_ERR(kt);
> 	}
> 	sched_set_normal(kt, -20);
>
> 	return 0;
> }
>
> static void __exit test_thread_exit(void) {
> 	dont_stop = false;
> 	kthread_stop(kt);
> }
>
> module_init(test_thread_init);
> module_exit(test_thread_exit);
> MODULE_LICENSE("GPL");

Thanks for the functional test! This should quite accurately reproduce
the situation when the printing thread is unable to catch up to the
amount of incoming messages.

Some function to explicitly trigger the scheduler may be needed. Such as
adding cond_resched() outside the critical section, before repeating the
loop. We would like to remove such explicit preemption points from the
kernel code, but perhaps it is necessary for the VOLUNTARY preemption
scheme.

John

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ