[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <087e8f711ab1814b7e035744256f3948d63aaf7f.camel@gmx.de>
Date: Wed, 01 Oct 2025 08:00:49 +0200
From: Mike Galbraith <efault@....de>
To: Sebastian Siewior <bigeasy@...utronix.de>, John Ogness
<john.ogness@...utronix.de>
Cc: Calvin Owens <calvin@...nvd.org>, Breno Leitao <leitao@...ian.org>, Petr
Mladek <pmladek@...e.com>, Simon Horman <horms@...nel.org>,
kuba@...nel.org, Pavel Begunkov <asml.silence@...il.com>, Johannes Berg
<johannes@...solutions.net>, paulmck@...nel.org, LKML
<linux-kernel@...r.kernel.org>, netdev@...r.kernel.org,
boqun.feng@...il.com, Sergey Senozhatsky <senozhatsky@...omium.org>, Steven
Rostedt <rostedt@...dmis.org>
Subject: Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
On Tue, 2025-09-30 at 19:35 +0200, Mike Galbraith wrote:
>
> I've been using a test coverage and monitoring patchlet for years that
> let's RT relax local exclusion to better suit its needs, substituting
> BH for IRQ exclusion....
BTW, that approach isn't perfect, but along with Breno's WIP nbcon
goodies it works pretty well. His netconsole tests are happy, RT can
do mundane netpoll business just fine (zzz).
The interesting bit is that RT can now also do some not so mundane:
netconsole.sh bart
modprobe crashtest
echo crash_nmi_kill > /sys/kernel/debug/crashtest (1 2 3 NMI punch)
./runltp (busy work for box)
bart's log: Look for "NMI watchdog"+PING!|WARN!|BOOM!
1 2 3
<record start/restart tag>
[ 161.779133] netconsole: network logging started
[ 219.873700] Process accounting resumed
[ 220.020136] Process accounting resumed
[ 229.208910] UDPLite: UDP-Lite is deprecated and scheduled to be removed in 2025, please contact the netdev mailing list
[ 229.209355] UDPLite6: UDP-Lite is deprecated and scheduled to be removed in 2025, please contact the netdev mailing list
[ 230.377870] watchdog: NMI watchdog PING!
[ 232.216928] NMI watchdog WARN!
[ 232.216932] WARNING: CPU: 1 PID: 4890 at kernel/watchdog.c:204 watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.216940] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 232.216941] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 4890, name: bind06
[ 232.216941] preempt_count: 110002, expected: 0
[ 232.216942] RCU nest depth: 1, expected: 1
[ 232.216943] Preemption disabled at:
[ 232.216943] [<ffffffff81c0ddd3>] schedule+0x23/0x90
[ 232.216948] CPU: 1 UID: 0 PID: 4890 Comm: bind06 Kdump: loaded Not tainted 6.17.0.ge5f0a698-master-nbcon-rt #154 PREEMPT_{RT,(lazy)}
[ 232.216950] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 232.216951] Call Trace:
[ 232.216952] <TASK>
[ 232.216954] dump_stack_lvl+0x5b/0x80
[ 232.216957] ? schedule+0x23/0x90
[ 232.216959] __might_resched.cold+0xcb/0x104
[ 232.216963] rt_spin_lock+0x36/0xc0
[ 232.216965] ___slab_alloc+0x7d/0x730
[ 232.216967] ? __alloc_skb+0x135/0x170
[ 232.216970] ? info_print_prefix+0xa7/0xd0
[ 232.216972] ? record_print_text+0x1cc/0x2b0
[ 232.216974] ? __alloc_skb+0x135/0x170
[ 232.216975] kmem_cache_alloc_node_noprof+0x7a/0x1c0
[ 232.216977] __alloc_skb+0x135/0x170
[ 232.216979] netpoll_prepare_skb+0x84/0x390 [netconsole]
[ 232.216983] netcon_write_thread+0xbb/0xf0 [netconsole]
[ 232.216986] nbcon_emit_next_record+0x24a/0x270
[ 232.216989] __nbcon_atomic_flush_pending_con+0x9a/0xf0
[ 232.216991] __nbcon_atomic_flush_pending+0xba/0x130
[ 232.216994] vprintk_emit+0x253/0x4d0
[ 232.216996] _printk+0x4c/0x50
[ 232.216999] ? vprintk_store+0x3f6/0x4a0
[ 232.217000] ? watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.217002] ? watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.217003] __warn.cold+0x89/0xf0
[ 232.217005] ? watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.217007] report_bug+0xe6/0x170
[ 232.217010] ? watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.217012] ? watchdog_hardlockup_crashtest+0x5d/0x60
[ 232.217013] handle_bug+0xe6/0x140
[ 232.217016] exc_invalid_op+0x17/0x60
[ 232.217017] asm_exc_invalid_op+0x1a/0x20
[ 232.217019] RIP: 0010:watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.217021] Code: c0 0f 84 2d d3 e5 ff 8b 15 02 94 8f 01 85 d2 0f 84 2b d3 e5 ff 83 f8 01 0f 8f 22 d3 e5 ff 48 c7 c7 06 2a 16 82 e8 85 04 f1 ff <0f> 0b c3 c3 90 66 0f 1f 00 0f 1f 44 00 00 41 57 41 56 41 55 41 89
[ 232.217022] RSP: 0000:ffff88810667bc18 EFLAGS: 00010096
[ 232.217024] RAX: 0000000000000012 RBX: ffff88810667bf58 RCX: 0000000000000000
[ 232.217025] RDX: 0000000080110002 RSI: 0000000000000001 RDI: 0000000000000001
[ 232.217026] RBP: ffff88810667bf58 R08: 00000000ffffbfff R09: ffff88840ebfffa8
[ 232.217026] R10: ffff88840eaa0000 R11: 0000000000000002 R12: 0000000000000000
[ 232.217027] R13: 0000000000000001 R14: ffffffffffffffff R15: ffff888100b00000
[ 232.217029] ? watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.217030] watchdog_hardlockup_check+0x1e/0xe0
[ 232.217032] __perf_event_overflow+0x111/0x3b0
[ 232.217035] ? perf_event_update_userpage+0x17/0x140
[ 232.217037] ? packet_notifier+0x74/0x220 [af_packet]
[ 232.217039] ? ndisc_netdev_event+0xda/0x2b0
[ 232.217042] ? effective_cpu_util+0x2f/0xc0
[ 232.217045] handle_pmi_common+0x1b5/0x3d0
[ 232.217047] ? sugov_update_single_freq+0xfe/0x230
[ 232.217050] ? sugov_get_util+0x2c/0x70
[ 232.217051] ? sugov_update_single_freq+0xfe/0x230
[ 232.217053] ? update_load_avg+0x3c5/0x3d0
[ 232.217054] ? update_curr+0x90/0x1b0
[ 232.217056] ? nohz_balance_exit_idle+0x1a/0xa0
[ 232.217058] ? nohz_balancer_kick+0x28b/0x320
[ 232.217060] ? perf_event_task_tick+0x73/0xc0
[ 232.217062] intel_pmu_handle_irq+0x116/0x2a0
[ 232.217063] perf_event_nmi_handler+0x2a/0x50
[ 232.217066] nmi_handle.part.0+0x53/0x110
[ 232.217069] default_do_nmi+0x15f/0x170
[ 232.217070] exc_nmi+0xef/0x110
[ 232.216926] ------------[ cut here ]------------
[ 232.217071] asm_exc_nmi+0x74/0xbd
[ 232.217073] RIP: 0033:0x406ca5
[ 232.217075] Code: 00 84 d2 0f 85 b2 00 00 00 8b 15 b2 36 03 00 8b 05 b0 36 03 00 39 c2 7d 24 8b 05 aa 36 03 00 85 c0 75 1a 83 05 73 36 03 00 01 <8b> 15 91 36 03 00 8b 05 8f 36 03 00 39 c2 7c df 0f 1f 00 8b 0d 7a
[ 232.217075] RSP: 002b:00007fff3e364580 EFLAGS: 00000202
[ 232.217076] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000fb62
[ 232.217077] RDX: 0000000000000092 RSI: 000000000043a2c8 RDI: 0000000000000004
[ 232.217078] RBP: 0000000000000000 R08: 00007f47ecdbd128 R09: 0000000000000000
[ 232.217078] R10: 00007f47ecdbd0e8 R11: 000000000000fb62 R12: 0000000000000000
[ 232.217079] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000438db8
[ 232.217080] </TASK>
[ 232.217140] Modules linked in: crashtest netconsole af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc iscsi_ibft iscsi_boot_sysfs rfkill nfnetlink binfmt_misc nls_iso8859_1 nls_cp437 snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec intel_rapl_msr snd_hwdep intel_rapl_common snd_hda_core x86_pkg_temp_thermal intel_powerclamp coretemp r8169 snd_pcm realtek mdio_devres kvm_intel at24 snd_timer libphy regmap_i2c iTCO_wdt snd intel_pmc_bxt mei_hdcp iTCO_vendor_support i2c_i801 mei_me lpc_ich usblp kvm joydev irqbypass pcspkr i2c_smbus mdio_bus mei soundcore mfd_core fan thermal uas usb_storage intel_smartconnect nfsd auth_rpcgss nfs_acl lockd sch_fq_codel grace sunrpc dm_mod fuse configfs dmi_sysfs hid_generic nouveau usbhid drm_ttm_helper ttm drm_client_lib gpu_sched i2c_algo_bit drm_gpuvm drm_exec drm_display_helper
[ 232.217181] drm_kms_helper xhci_pci xhci_hcd drm ghash_clmulni_intel ehci_pci ahci ehci_hcd libahci libata usbcore video wmi usb_common button sd_mod scsi_dh_emc scsi_dh_rdac scsi_dh_alua sg scsi_mod scsi_common vfat fat ext4 crc16 mbcache jbd2 loop msr efivarfs autofs4 aesni_intel gf128mul
[ 232.217198] CPU: 1 UID: 0 PID: 4890 Comm: bind06 Kdump: loaded Tainted: G W 6.17.0.ge5f0a698-master-nbcon-rt #154 PREEMPT_{RT,(lazy)}
[ 232.217201] Tainted: [W]=WARN
[ 232.217202] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 232.217203] RIP: 0010:watchdog_hardlockup_crashtest+0x5b/0x60
[ 232.217206] Code: c0 0f 84 2d d3 e5 ff 8b 15 02 94 8f 01 85 d2 0f 84 2b d3 e5 ff 83 f8 01 0f 8f 22 d3 e5 ff 48 c7 c7 06 2a 16 82 e8 85 04 f1 ff <0f> 0b c3 c3 90 66 0f 1f 00 0f 1f 44 00 00 41 57 41 56 41 55 41 89
[ 232.217208] RSP: 0000:ffff88810667bc18 EFLAGS: 00010096
[ 232.217210] RAX: 0000000000000012 RBX: ffff88810667bf58 RCX: 0000000000000000
[ 232.217211] RDX: 0000000080110002 RSI: 0000000000000001 RDI: 0000000000000001
[ 232.217213] RBP: ffff88810667bf58 R08: 00000000ffffbfff R09: ffff88840ebfffa8
[ 232.217214] R10: ffff88840eaa0000 R11: 0000000000000002 R12: 0000000000000000
[ 232.217215] R13: 0000000000000001 R14: ffffffffffffffff R15: ffff888100b00000
[ 232.217217] FS: 00007f47ecd7e740(0000) GS:ffff88848c02c000(0000) knlGS:0000000000000000
[ 232.217219] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 232.217220] CR2: 00007f61356005a0 CR3: 000000012281e005 CR4: 00000000001726f0
[ 232.217222] Call Trace:
[ 232.217223] <TASK>
[ 232.217224] watchdog_hardlockup_check+0x1e/0xe0
[ 232.217227] __perf_event_overflow+0x111/0x3b0
[ 232.217230] ? perf_event_update_userpage+0x17/0x140
[ 232.217232] ? packet_notifier+0x74/0x220 [af_packet]
[ 232.217234] ? ndisc_netdev_event+0xda/0x2b0
[ 232.217237] ? effective_cpu_util+0x2f/0xc0
[ 232.217240] handle_pmi_common+0x1b5/0x3d0
[ 232.217242] ? sugov_update_single_freq+0xfe/0x230
[ 232.217244] ? sugov_get_util+0x2c/0x70
[ 232.217246] ? sugov_update_single_freq+0xfe/0x230
[ 232.217248] ? update_load_avg+0x3c5/0x3d0
[ 232.217251] ? update_curr+0x90/0x1b0
[ 232.217253] ? nohz_balance_exit_idle+0x1a/0xa0
[ 232.217256] ? nohz_balancer_kick+0x28b/0x320
[ 232.217259] ? perf_event_task_tick+0x73/0xc0
[ 232.217262] intel_pmu_handle_irq+0x116/0x2a0
[ 232.217264] perf_event_nmi_handler+0x2a/0x50
[ 232.217267] nmi_handle.part.0+0x53/0x110
[ 232.217269] default_do_nmi+0x15f/0x170
[ 232.217271] exc_nmi+0xef/0x110
[ 232.217273] asm_exc_nmi+0x74/0xbd
[ 232.217274] RIP: 0033:0x406ca5
[ 232.217276] Code: 00 84 d2 0f 85 b2 00 00 00 8b 15 b2 36 03 00 8b 05 b0 36 03 00 39 c2 7d 24 8b 05 aa 36 03 00 85 c0 75 1a 83 05 73 36 03 00 01 <8b> 15 91 36 03 00 8b 05 8f 36 03 00 39 c2 7c df 0f 1f 00 8b 0d 7a
[ 232.217277] RSP: 002b:00007fff3e364580 EFLAGS: 00000202
[ 232.217279] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000fb62
[ 232.217280] RDX: 0000000000000092 RSI: 000000000043a2c8 RDI: 0000000000000004
[ 232.217282] RBP: 0000000000000000 R08: 00007f47ecdbd128 R09: 0000000000000000
[ 232.217283] R10: 00007f47ecdbd0e8 R11: 000000000000fb62 R12: 0000000000000000
[ 232.217284] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000438db8
[ 232.217286] </TASK>
[ 232.217287] ---[ end trace 0000000000000000 ]---
[ 232.217293] perf: interrupt took too long (2857 > 2500), lowering kernel.perf_event_max_sample_rate to 70000
[ 233.447090] Kernel panic - not syncing: NMI watchdog BOOM!
[ 233.447096] CPU: 2 UID: 0 PID: 4891 Comm: bind06 Kdump: loaded Tainted: G W 6.17.0.ge5f0a698-master-nbcon-rt #154 PREEMPT_{RT,(lazy)}
[ 233.447099] Tainted: [W]=WARN
[ 233.447101] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 233.447103] Call Trace:
[ 233.447106] <TASK>
[ 233.447109] dump_stack_lvl+0x5b/0x80
[ 233.447116] vpanic+0xca/0x290
[ 233.447120] panic+0x4c/0x4c
[ 233.447124] watchdog_hardlockup_crashtest.cold+0xc/0x18
[ 233.447129] watchdog_hardlockup_check+0x1e/0xe0
[ 233.447136] __perf_event_overflow+0x111/0x3b0
[ 233.447141] ? perf_event_update_userpage+0x17/0x140
[ 233.447145] ? effective_cpu_util+0x2f/0xc0
[ 233.447151] handle_pmi_common+0x1b5/0x3d0
[ 233.447155] ? sugov_update_single_freq+0xfe/0x230
[ 233.447160] ? sugov_get_util+0x2c/0x70
[ 233.447163] ? sugov_update_single_freq+0xfe/0x230
[ 233.447167] ? update_load_avg+0x3c5/0x3d0
[ 233.447172] ? update_curr+0x90/0x1b0
[ 233.447176] ? nohz_balance_exit_idle+0x1a/0xa0
[ 233.447179] ? nohz_balancer_kick+0x28b/0x320
[ 233.447183] ? perf_event_task_tick+0x73/0xc0
[ 233.447186] intel_pmu_handle_irq+0x116/0x2a0
[ 233.447190] perf_event_nmi_handler+0x2a/0x50
[ 233.447194] nmi_handle.part.0+0x53/0x110
[ 233.447199] default_do_nmi+0x15f/0x170
[ 233.447202] exc_nmi+0xef/0x110
[ 233.447204] asm_exc_nmi+0x74/0xbd
[ 233.447208] RIP: 0033:0x406018
[ 233.447212] Code: c0 0f 85 6b fe ff ff 83 05 1c 43 03 00 01 8b 15 3e 43 03 00 8b 05 34 43 03 00 39 c2 7c db e9 4f fe ff ff 0f 1f 80 00 00 00 00 <8b> 05 26 43 03 00 85 c0 0f 85 89 fe ff ff 8b 15 14 43 03 00 8b 05
[ 233.447214] RSP: 002b:00007f47ec9feec0 EFLAGS: 00000297
[ 233.447218] RAX: 00000000000000cd RBX: 0000000000000000 RCX: 000000000000fdca
[ 233.447221] RDX: 00000000000000cc RSI: 000000000043a2d8 RDI: 0000000000000004
[ 233.447222] RBP: 0000000000000000 R08: 00007f47ecdbd128 R09: 0000000000000000
[ 233.447224] R10: 00007f47ecdbd0e8 R11: 000000000000fdca R12: ffffffffffffff88
[ 233.447226] R13: 0000000000000000 R14: 00007fff3e364350 R15: 00007f47ec1ff000
[ 233.447230] </TASK>
Powered by blists - more mailing lists