[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202406201623.d1a3b599-lkp@intel.com>
Date: Thu, 20 Jun 2024 16:46:37 +0800
From: kernel test robot <oliver.sang@...el.com>
To: "Ricardo B. Marliere" <ricardo@...liere.net>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Tejun Heo <tj@...nel.org>,
<oliver.sang@...el.com>
Subject: [linus:master] [workqueue] 5df9197edd:
BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
Hello,
kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![swapper:#]" on:
commit: 5df9197edd731a44682444631d68c2384f381bf1 ("workqueue: make wq_subsys const")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master d20f6b3d747c36889b7ce75ee369182af3decb6b]
[test failed on linux-next/master 6906a84c482f098d31486df8dc98cead21cce2d0]
in testcase: boot
compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
we don't have enough knowledge why this commit could cause random issues,
while the parent keeps clean even when we run both till 999 times.
=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
gcc-13/i386-randconfig-r015-20211011/debian-11.1-i386-20220923.cgz/1/vm-snb/boot
f297a3844aa059c5 5df9197edd731a44682444631d6
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:999 6% 56:999 dmesg.BUG:kernel_hang_in_boot_stage
:999 1% 8:999 dmesg.BUG:kernel_hang_in_test_stage
:999 0% 1:999 dmesg.BUG:soft_lockup-CPU##stuck_for#s![kcompactd0:#]
:999 0% 3:999 dmesg.BUG:soft_lockup-CPU##stuck_for#s![kworker##:#]
:999 9% 89:999 dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
:999 10% 102:999 dmesg.BUG:workqueue_lockup-pool
:999 1% 7:999 dmesg.EIP:__do_softirq
:999 0% 1:999 dmesg.EIP:__down_trylock_console_sem
:999 1% 10:999 dmesg.EIP:_raw_spin_unlock_irq
:999 0% 3:999 dmesg.EIP:_raw_spin_unlock_irqrestore
:999 0% 2:999 dmesg.EIP:console_flush_all
:999 0% 1:999 dmesg.EIP:console_unlock
:999 1% 7:999 dmesg.EIP:lock_acquire
:999 1% 7:999 dmesg.EIP:lock_release
:999 1% 12:999 dmesg.EIP:raw_spin_rq_unlock_irq
:999 0% 1:999 dmesg.EIP:rcu_check_gp_start_stall
:999 0% 2:999 dmesg.EIP:seqcount_lockdep_reader_access
:999 0% 1:999 dmesg.EIP:tcp_orphan_update
:999 9% 88:999 dmesg.EIP:tick_clock_notify
:999 0% 1:999 dmesg.EIP:vprintk_store
:999 0% 1:999 dmesg.EIP:vsnprintf
:999 2% 15:999 dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
:999 0% 3:999 dmesg.INFO:rcu_preempt_self-detected_stall_on_CPU
:999 3% 29:999 dmesg.INFO:task_blocked_for_more_than#seconds
:999 2% 16:999 dmesg.IP-Config:Auto-configuration_of_network_failed
:999 3% 29:999 dmesg.Kernel_panic-not_syncing:hung_task:blocked_tasks
:999 9% 93:999 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
:999 0% 1:999 dmesg.WARNING:at_kernel/rcu/tree_stall.h:#rcu_check_gp_start_stall
one hint we could supply is the config for this test is a randconfig which is
uploaded to [1]
at the same time, we also tried another config as attached
not-reproduce-config-6.8.0-rc2-00006-g5df9197edd73
both parent/this commit seem clean in the tests for this config.
=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
gcc-13/i386-debian-10.3-func/debian-11.1-i386-20220923.cgz/1/vm-snb/boot
f297a3844aa059c5 5df9197edd731a44682444631d6
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
999:999 0% 999:999 pass
but since the diff between two configs are very big, it's hard for us to
supply further information.
just made this report FYI what we observed in our tests.
if you have any debug patch want us to test, please let us know. Thanks
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202406201623.d1a3b599-lkp@intel.com
[ 696.522062][ C0] watchdog: BUG: soft lockup - CPU#0 stuck for 626s! [swapper:1]
[ 696.522062][ C0] Modules linked in:
[ 696.522062][ C0] irq event stamp: 110942
[ 696.522062][ C0] hardirqs last enabled at (110941): timekeeping_notify (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 include/linux/stop_machine.h:154 include/linux/stop_machine.h:161 kernel/time/timekeeping.c:1506)
[ 696.522062][ C0] hardirqs last disabled at (110942): sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076)
[ 696.522062][ C0] softirqs last enabled at (110790): __do_softirq (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:400 kernel/softirq.c:582)
[ 696.522062][ C0] softirqs last disabled at (110779): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147)
[ 696.522062][ C0] CPU: 0 PID: 1 Comm: swapper Not tainted 6.8.0-rc2-00006-g5df9197edd73 #1
[ 696.522062][ C0] EIP: tick_clock_notify (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607)
[ 696.522062][ C0] Code: 15 d4 1b 8e 98 8b 45 e4 8b 55 e8 a3 d8 1b 8e 98 89 15 dc 1b 8e 98 89 35 c0 1b 8e 98 89 1d c4 1b 8e 98 83 c4 10 5b 5e 5f 5d c3 <80> 0d e4 1b 8e 98 01 c3 80 0d e4 1b 8e 98 01 c3 0f ba 35 e4 1b 8e
All code
========
0: 15 d4 1b 8e 98 adc $0x988e1bd4,%eax
5: 8b 45 e4 mov -0x1c(%rbp),%eax
8: 8b 55 e8 mov -0x18(%rbp),%edx
b: a3 d8 1b 8e 98 89 15 movabs %eax,0x1bdc1589988e1bd8
12: dc 1b
14: 8e 98 89 35 c0 1b mov 0x1bc03589(%rax),%ds
1a: 8e 98 89 1d c4 1b mov 0x1bc41d89(%rax),%ds
20: 8e 98 83 c4 10 5b mov 0x5b10c483(%rax),%ds
26: 5e pop %rsi
27: 5f pop %rdi
28: 5d pop %rbp
29: c3 ret
2a:* 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1c15 <-- trapping instruction
31: c3 ret
32: 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1c1d
39: c3 ret
3a: 0f .byte 0xf
3b: ba 35 e4 1b 8e mov $0x8e1be435,%edx
Code starting with the faulting instruction
===========================================
0: 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1beb
7: c3 ret
8: 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1bf3
f: c3 ret
10: 0f .byte 0xf
11: ba 35 e4 1b 8e mov $0x8e1be435,%edx
[ 696.522062][ C0] EAX: 0001b15d EBX: 984678a0 ECX: 00000006 EDX: 83f6d220
[ 696.522062][ C0] ESI: 988e0820 EDI: 00000001 EBP: 82d17ed8 ESP: 82d17ec0
[ 696.522062][ C0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00000202
[ 696.522062][ C0] CR0: 80050033 CR2: ffda9000 CR3: 18c90000 CR4: 000406d0
[ 696.522062][ C0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 696.522062][ C0] DR6: fffe0ff0 DR7: 00000400
[ 696.522062][ C0] Call Trace:
[ 696.522062][ C0] ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
[ 696.522062][ C0] ? watchdog_timer_fn (kernel/watchdog.c:545)
[ 696.522062][ C0] ? __hrtimer_run_queues+0x10b/0x196
[ 696.522062][ C0] ? schedule_work (drivers/usb/core/hub.c:890)
[ 696.522062][ C0] ? hrtimer_run_queues (kernel/time/hrtimer.c:1928)
[ 696.522062][ C0] ? update_process_times (kernel/time/timer.c:2065 (discriminator 2) kernel/time/timer.c:2086 (discriminator 2))
[ 696.522062][ C0] ? tick_periodic+0xb0/0xb2
[ 696.522062][ C0] ? tick_handle_periodic (kernel/time/tick-common.c:145)
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076)
[ 696.522062][ C0] ? __sysvec_apic_timer_interrupt (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/jump_label.h:260 include/linux/jump_label.h:270 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1083)
[ 696.522062][ C0] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 4))
[ 696.522062][ C0] ? handle_exception (arch/x86/entry/entry_32.S:1056)
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076)
[ 696.522062][ C0] ? tick_cancel_sched_timer (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607)
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076)
[ 696.522062][ C0] ? tick_cancel_sched_timer (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607)
[ 696.522062][ C0] ? timekeeping_notify (kernel/time/timekeeping.c:1508)
[ 696.522062][ C0] __clocksource_select (kernel/time/clocksource.c:1047 (discriminator 1))
[ 696.522062][ C0] ? boot_override_clock (kernel/time/clocksource.c:1079)
[ 696.522062][ C0] clocksource_select (kernel/time/clocksource.c:1064)
[ 696.522062][ C0] clocksource_done_booting (kernel/time/clocksource.c:1088)
[ 696.522062][ C0] do_one_initcall (init/main.c:1236)
[ 696.522062][ C0] ? parse_args (kernel/params.c:168)
[ 696.522062][ C0] do_initcalls (init/main.c:1297 (discriminator 3) init/main.c:1314 (discriminator 3))
[ 696.522062][ C0] ? rest_init (init/main.c:1433)
[ 696.522062][ C0] kernel_init_freeable (init/main.c:1553)
[ 696.522062][ C0] kernel_init (init/main.c:1443)
[ 696.522062][ C0] ret_from_fork (arch/x86/kernel/process.c:153)
[ 696.522062][ C0] ? rest_init (init/main.c:1433)
[ 696.522062][ C0] ret_from_fork_asm (arch/x86/entry/entry_32.S:741)
[ 696.522062][ C0] entry_INT80_32 (arch/x86/entry/entry_32.S:947)
[ 696.522062][ C0] Kernel panic - not syncing: softlockup: hung tasks
[ 696.522062][ C0] CPU: 0 PID: 1 Comm: swapper Tainted: G L 6.8.0-rc2-00006-g5df9197edd73 #1
[ 696.522062][ C0] Call Trace:
[ 696.522062][ C0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 696.522062][ C0] dump_stack (lib/dump_stack.c:114)
[ 696.522062][ C0] panic (kernel/panic.c:362)
[ 696.522062][ C0] watchdog_timer_fn (kernel/watchdog.c:519)
[ 696.522062][ C0] __hrtimer_run_queues+0x10b/0x196
[ 696.522062][ C0] ? schedule_work (drivers/usb/core/hub.c:890)
[ 696.522062][ C0] hrtimer_run_queues (kernel/time/hrtimer.c:1928)
[ 696.522062][ C0] update_process_times (kernel/time/timer.c:2065 (discriminator 2) kernel/time/timer.c:2086 (discriminator 2))
[ 696.522062][ C0] tick_periodic+0xb0/0xb2
[ 696.522062][ C0] tick_handle_periodic (kernel/time/tick-common.c:145)
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076)
[ 696.522062][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/jump_label.h:260 include/linux/jump_label.h:270 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1083)
[ 696.522062][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 4))
[ 696.522062][ C0] handle_exception (arch/x86/entry/entry_32.S:1056)
[ 696.522062][ C0] EIP: tick_clock_notify (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607)
[ 696.522062][ C0] Code: 15 d4 1b 8e 98 8b 45 e4 8b 55 e8 a3 d8 1b 8e 98 89 15 dc 1b 8e 98 89 35 c0 1b 8e 98 89 1d c4 1b 8e 98 83 c4 10 5b 5e 5f 5d c3 <80> 0d e4 1b 8e 98 01 c3 80 0d e4 1b 8e 98 01 c3 0f ba 35 e4 1b 8e
All code
========
0: 15 d4 1b 8e 98 adc $0x988e1bd4,%eax
5: 8b 45 e4 mov -0x1c(%rbp),%eax
8: 8b 55 e8 mov -0x18(%rbp),%edx
b: a3 d8 1b 8e 98 89 15 movabs %eax,0x1bdc1589988e1bd8
12: dc 1b
14: 8e 98 89 35 c0 1b mov 0x1bc03589(%rax),%ds
1a: 8e 98 89 1d c4 1b mov 0x1bc41d89(%rax),%ds
20: 8e 98 83 c4 10 5b mov 0x5b10c483(%rax),%ds
26: 5e pop %rsi
27: 5f pop %rdi
28: 5d pop %rbp
29: c3 ret
2a:* 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1c15 <-- trapping instruction
31: c3 ret
32: 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1c1d
39: c3 ret
3a: 0f .byte 0xf
3b: ba 35 e4 1b 8e mov $0x8e1be435,%edx
Code starting with the faulting instruction
===========================================
0: 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1beb
7: c3 ret
8: 80 0d e4 1b 8e 98 01 orb $0x1,-0x6771e41c(%rip) # 0xffffffff988e1bf3
f: c3 ret
10: 0f .byte 0xf
11: ba 35 e4 1b 8e mov $0x8e1be435,%edx
[1]
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240620/202406201623.d1a3b599-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists