linux-kernel - Re: RCU: Call touch_nmi_watchdog() while printing stall warnings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180110053450.GA1063@jagdpanzerIV>
Date:   Wed, 10 Jan 2018 14:34:50 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: RCU: Call touch_nmi_watchdog() while printing stall warnings

Hello,

On (01/09/18 10:47), Tejun Heo wrote:
> When RCU stall warning triggers, it can print out a lot of messages
> while holding spinlocks.  If the console device is slow (e.g. an
> actual or IPMI serial console), it may end up triggering NMI hard
> lockup watchdog like the following.
> 
> *** CPU printking while holding RCU spinlock
> 
>   PID: 4149739  TASK: ffff881a46baa880  CPU: 13  COMMAND: "CPUThreadPool8"
>    #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0
>    #1 [ffff881fff945e58] nmi_handle at ffffffff81020653
>    #2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36
>    #3 [ffff881fff945ed0] do_nmi at ffffffff81020d32
>    #4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e
>       [exception RIP: io_serial_in+21]
>       RIP: ffffffff81630e55  RSP: ffff881fff943b88  RFLAGS: 00000002
>       RAX: 000000000000ca00  RBX: ffffffff8230e188  RCX: 0000000000000000
>       RDX: 00000000000002fd  RSI: 0000000000000005  RDI: ffffffff8230e188
>       RBP: ffff881fff943bb0   R8: 0000000000000000   R9: ffffffff820cb3c4
>       R10: 0000000000000019  R11: 0000000000002000  R12: 00000000000026e1
>       R13: 0000000000000020  R14: ffffffff820cd398  R15: 0000000000000035
>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
>   --- <NMI exception stack> ---
>    #5 [ffff881fff943b88] io_serial_in at ffffffff81630e55
>    #6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c
>    #7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc
>    #8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00
>    #9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691
>   #10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2
>   #11 [ffff881fff943c90] console_unlock at ffffffff810dfc55
>   #12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5
>   #13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf
>   #14 [ffff881fff943d60] vprintk_func at ffffffff810e1127
>   #15 [ffff881fff943d70] printk at ffffffff8119a8a4
>   #16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c
>   #17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133
>   #18 [ffff881fff943ee8] update_process_times at ffffffff810f3497
>   #19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037
>   #20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38
>   #21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b

Tejun, while we are at it, shall we also touch NMI in show_workqueue_state()?
I suppose that show_pwq() can be quite verbose on some systems, can't it?

for instance: sysrq-t [irq] -> show_workqueue_state()

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9960f8842b31..649c00a9af12 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -48,6 +48,7 @@
 #include <linux/moduleparam.h>
 #include <linux/uaccess.h>
 #include <linux/sched/isolation.h>
+#include <linux/nmi.h>
 
 #include "workqueue_internal.h"
 
@@ -4473,6 +4474,8 @@ void show_workqueue_state(void)
 			if (pwq->nr_active || !list_empty(&pwq->delayed_works))
 				show_pwq(pwq);
 			spin_unlock_irqrestore(&pwq->pool->lock, flags);
+
+			touch_nmi_watchdog();
 		}
 	}