lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202406201623.d1a3b599-lkp@intel.com>
Date: Thu, 20 Jun 2024 16:46:37 +0800
From: kernel test robot <oliver.sang@...el.com>
To: "Ricardo B. Marliere" <ricardo@...liere.net>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Tejun Heo <tj@...nel.org>,
	<oliver.sang@...el.com>
Subject: [linus:master] [workqueue]  5df9197edd:
 BUG:soft_lockup-CPU##stuck_for#s![swapper:#]



Hello,

kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![swapper:#]" on:

commit: 5df9197edd731a44682444631d68c2384f381bf1 ("workqueue: make wq_subsys const")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      d20f6b3d747c36889b7ce75ee369182af3decb6b]
[test failed on linux-next/master 6906a84c482f098d31486df8dc98cead21cce2d0]

in testcase: boot

compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


we don't have enough knowledge why this commit could cause random issues,
while the parent keeps clean even when we run both till 999 times.

=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
  gcc-13/i386-randconfig-r015-20211011/debian-11.1-i386-20220923.cgz/1/vm-snb/boot

f297a3844aa059c5 5df9197edd731a44682444631d6
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :999          6%          56:999   dmesg.BUG:kernel_hang_in_boot_stage
           :999          1%           8:999   dmesg.BUG:kernel_hang_in_test_stage
           :999          0%           1:999   dmesg.BUG:soft_lockup-CPU##stuck_for#s![kcompactd0:#]
           :999          0%           3:999   dmesg.BUG:soft_lockup-CPU##stuck_for#s![kworker##:#]
           :999          9%          89:999   dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
           :999         10%         102:999   dmesg.BUG:workqueue_lockup-pool
           :999          1%           7:999   dmesg.EIP:__do_softirq
           :999          0%           1:999   dmesg.EIP:__down_trylock_console_sem
           :999          1%          10:999   dmesg.EIP:_raw_spin_unlock_irq
           :999          0%           3:999   dmesg.EIP:_raw_spin_unlock_irqrestore
           :999          0%           2:999   dmesg.EIP:console_flush_all
           :999          0%           1:999   dmesg.EIP:console_unlock
           :999          1%           7:999   dmesg.EIP:lock_acquire
           :999          1%           7:999   dmesg.EIP:lock_release
           :999          1%          12:999   dmesg.EIP:raw_spin_rq_unlock_irq
           :999          0%           1:999   dmesg.EIP:rcu_check_gp_start_stall
           :999          0%           2:999   dmesg.EIP:seqcount_lockdep_reader_access
           :999          0%           1:999   dmesg.EIP:tcp_orphan_update
           :999          9%          88:999   dmesg.EIP:tick_clock_notify
           :999          0%           1:999   dmesg.EIP:vprintk_store
           :999          0%           1:999   dmesg.EIP:vsnprintf
           :999          2%          15:999   dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
           :999          0%           3:999   dmesg.INFO:rcu_preempt_self-detected_stall_on_CPU
           :999          3%          29:999   dmesg.INFO:task_blocked_for_more_than#seconds
           :999          2%          16:999   dmesg.IP-Config:Auto-configuration_of_network_failed
           :999          3%          29:999   dmesg.Kernel_panic-not_syncing:hung_task:blocked_tasks
           :999          9%          93:999   dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
           :999          0%           1:999   dmesg.WARNING:at_kernel/rcu/tree_stall.h:#rcu_check_gp_start_stall

one hint we could supply is the config for this test is a randconfig which is
uploaded to [1]

at the same time, we also tried another config as attached
  not-reproduce-config-6.8.0-rc2-00006-g5df9197edd73

both parent/this commit seem clean in the tests for this config.

=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
  gcc-13/i386-debian-10.3-func/debian-11.1-i386-20220923.cgz/1/vm-snb/boot

f297a3844aa059c5 5df9197edd731a44682444631d6
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
        999:999          0%         999:999   pass


but since the diff between two configs are very big, it's hard for us to
supply further information.

just made this report FYI what we observed in our tests.
if you have any debug patch want us to test, please let us know. Thanks


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202406201623.d1a3b599-lkp@intel.com


[  696.522062][    C0] watchdog: BUG: soft lockup - CPU#0 stuck for 626s! [swapper:1]
[  696.522062][    C0] Modules linked in:
[  696.522062][    C0] irq event stamp: 110942
[ 696.522062][ C0] hardirqs last enabled at (110941): timekeeping_notify (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 include/linux/stop_machine.h:154 include/linux/stop_machine.h:161 kernel/time/timekeeping.c:1506) 
[ 696.522062][ C0] hardirqs last disabled at (110942): sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076) 
[ 696.522062][ C0] softirqs last enabled at (110790): __do_softirq (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:400 kernel/softirq.c:582) 
[ 696.522062][ C0] softirqs last disabled at (110779): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147) 
[  696.522062][    C0] CPU: 0 PID: 1 Comm: swapper Not tainted 6.8.0-rc2-00006-g5df9197edd73 #1
[ 696.522062][ C0] EIP: tick_clock_notify (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607) 
[ 696.522062][ C0] Code: 15 d4 1b 8e 98 8b 45 e4 8b 55 e8 a3 d8 1b 8e 98 89 15 dc 1b 8e 98 89 35 c0 1b 8e 98 89 1d c4 1b 8e 98 83 c4 10 5b 5e 5f 5d c3 <80> 0d e4 1b 8e 98 01 c3 80 0d e4 1b 8e 98 01 c3 0f ba 35 e4 1b 8e
All code
========
   0:	15 d4 1b 8e 98       	adc    $0x988e1bd4,%eax
   5:	8b 45 e4             	mov    -0x1c(%rbp),%eax
   8:	8b 55 e8             	mov    -0x18(%rbp),%edx
   b:	a3 d8 1b 8e 98 89 15 	movabs %eax,0x1bdc1589988e1bd8
  12:	dc 1b 
  14:	8e 98 89 35 c0 1b    	mov    0x1bc03589(%rax),%ds
  1a:	8e 98 89 1d c4 1b    	mov    0x1bc41d89(%rax),%ds
  20:	8e 98 83 c4 10 5b    	mov    0x5b10c483(%rax),%ds
  26:	5e                   	pop    %rsi
  27:	5f                   	pop    %rdi
  28:	5d                   	pop    %rbp
  29:	c3                   	ret
  2a:*	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1c15		<-- trapping instruction
  31:	c3                   	ret
  32:	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1c1d
  39:	c3                   	ret
  3a:	0f                   	.byte 0xf
  3b:	ba 35 e4 1b 8e       	mov    $0x8e1be435,%edx

Code starting with the faulting instruction
===========================================
   0:	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1beb
   7:	c3                   	ret
   8:	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1bf3
   f:	c3                   	ret
  10:	0f                   	.byte 0xf
  11:	ba 35 e4 1b 8e       	mov    $0x8e1be435,%edx
[  696.522062][    C0] EAX: 0001b15d EBX: 984678a0 ECX: 00000006 EDX: 83f6d220
[  696.522062][    C0] ESI: 988e0820 EDI: 00000001 EBP: 82d17ed8 ESP: 82d17ec0
[  696.522062][    C0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00000202
[  696.522062][    C0] CR0: 80050033 CR2: ffda9000 CR3: 18c90000 CR4: 000406d0
[  696.522062][    C0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  696.522062][    C0] DR6: fffe0ff0 DR7: 00000400
[  696.522062][    C0] Call Trace:
[ 696.522062][ C0] ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) 
[ 696.522062][ C0] ? watchdog_timer_fn (kernel/watchdog.c:545) 
[ 696.522062][ C0] ? __hrtimer_run_queues+0x10b/0x196 
[ 696.522062][ C0] ? schedule_work (drivers/usb/core/hub.c:890) 
[ 696.522062][ C0] ? hrtimer_run_queues (kernel/time/hrtimer.c:1928) 
[ 696.522062][ C0] ? update_process_times (kernel/time/timer.c:2065 (discriminator 2) kernel/time/timer.c:2086 (discriminator 2)) 
[ 696.522062][ C0] ? tick_periodic+0xb0/0xb2 
[ 696.522062][ C0] ? tick_handle_periodic (kernel/time/tick-common.c:145) 
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076) 
[ 696.522062][ C0] ? __sysvec_apic_timer_interrupt (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/jump_label.h:260 include/linux/jump_label.h:270 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1083) 
[ 696.522062][ C0] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 4)) 
[ 696.522062][ C0] ? handle_exception (arch/x86/entry/entry_32.S:1056) 
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076) 
[ 696.522062][ C0] ? tick_cancel_sched_timer (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607) 
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076) 
[ 696.522062][ C0] ? tick_cancel_sched_timer (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607) 
[ 696.522062][ C0] ? timekeeping_notify (kernel/time/timekeeping.c:1508) 
[ 696.522062][ C0] __clocksource_select (kernel/time/clocksource.c:1047 (discriminator 1)) 
[ 696.522062][ C0] ? boot_override_clock (kernel/time/clocksource.c:1079) 
[ 696.522062][ C0] clocksource_select (kernel/time/clocksource.c:1064) 
[ 696.522062][ C0] clocksource_done_booting (kernel/time/clocksource.c:1088) 
[ 696.522062][ C0] do_one_initcall (init/main.c:1236) 
[ 696.522062][ C0] ? parse_args (kernel/params.c:168) 
[ 696.522062][ C0] do_initcalls (init/main.c:1297 (discriminator 3) init/main.c:1314 (discriminator 3)) 
[ 696.522062][ C0] ? rest_init (init/main.c:1433) 
[ 696.522062][ C0] kernel_init_freeable (init/main.c:1553) 
[ 696.522062][ C0] kernel_init (init/main.c:1443) 
[ 696.522062][ C0] ret_from_fork (arch/x86/kernel/process.c:153) 
[ 696.522062][ C0] ? rest_init (init/main.c:1433) 
[ 696.522062][ C0] ret_from_fork_asm (arch/x86/entry/entry_32.S:741) 
[ 696.522062][ C0] entry_INT80_32 (arch/x86/entry/entry_32.S:947) 
[  696.522062][    C0] Kernel panic - not syncing: softlockup: hung tasks
[  696.522062][    C0] CPU: 0 PID: 1 Comm: swapper Tainted: G             L     6.8.0-rc2-00006-g5df9197edd73 #1
[  696.522062][    C0] Call Trace:
[ 696.522062][ C0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) 
[ 696.522062][ C0] dump_stack (lib/dump_stack.c:114) 
[ 696.522062][ C0] panic (kernel/panic.c:362) 
[ 696.522062][ C0] watchdog_timer_fn (kernel/watchdog.c:519) 
[ 696.522062][ C0] __hrtimer_run_queues+0x10b/0x196 
[ 696.522062][ C0] ? schedule_work (drivers/usb/core/hub.c:890) 
[ 696.522062][ C0] hrtimer_run_queues (kernel/time/hrtimer.c:1928) 
[ 696.522062][ C0] update_process_times (kernel/time/timer.c:2065 (discriminator 2) kernel/time/timer.c:2086 (discriminator 2)) 
[ 696.522062][ C0] tick_periodic+0xb0/0xb2 
[ 696.522062][ C0] tick_handle_periodic (kernel/time/tick-common.c:145) 
[ 696.522062][ C0] ? sysvec_hyperv_stimer0 (arch/x86/kernel/apic/apic.c:1076) 
[ 696.522062][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/jump_label.h:260 include/linux/jump_label.h:270 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1083) 
[ 696.522062][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 4)) 
[ 696.522062][ C0] handle_exception (arch/x86/entry/entry_32.S:1056) 
[ 696.522062][ C0] EIP: tick_clock_notify (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/time/tick-sched.c:1607) 
[ 696.522062][ C0] Code: 15 d4 1b 8e 98 8b 45 e4 8b 55 e8 a3 d8 1b 8e 98 89 15 dc 1b 8e 98 89 35 c0 1b 8e 98 89 1d c4 1b 8e 98 83 c4 10 5b 5e 5f 5d c3 <80> 0d e4 1b 8e 98 01 c3 80 0d e4 1b 8e 98 01 c3 0f ba 35 e4 1b 8e
All code
========
   0:	15 d4 1b 8e 98       	adc    $0x988e1bd4,%eax
   5:	8b 45 e4             	mov    -0x1c(%rbp),%eax
   8:	8b 55 e8             	mov    -0x18(%rbp),%edx
   b:	a3 d8 1b 8e 98 89 15 	movabs %eax,0x1bdc1589988e1bd8
  12:	dc 1b 
  14:	8e 98 89 35 c0 1b    	mov    0x1bc03589(%rax),%ds
  1a:	8e 98 89 1d c4 1b    	mov    0x1bc41d89(%rax),%ds
  20:	8e 98 83 c4 10 5b    	mov    0x5b10c483(%rax),%ds
  26:	5e                   	pop    %rsi
  27:	5f                   	pop    %rdi
  28:	5d                   	pop    %rbp
  29:	c3                   	ret
  2a:*	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1c15		<-- trapping instruction
  31:	c3                   	ret
  32:	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1c1d
  39:	c3                   	ret
  3a:	0f                   	.byte 0xf
  3b:	ba 35 e4 1b 8e       	mov    $0x8e1be435,%edx

Code starting with the faulting instruction
===========================================
   0:	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1beb
   7:	c3                   	ret
   8:	80 0d e4 1b 8e 98 01 	orb    $0x1,-0x6771e41c(%rip)        # 0xffffffff988e1bf3
   f:	c3                   	ret
  10:	0f                   	.byte 0xf
  11:	ba 35 e4 1b 8e       	mov    $0x8e1be435,%edx


[1]
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240620/202406201623.d1a3b599-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ