[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202509081329.81f1ed82-lkp@intel.com>
Date: Mon, 8 Sep 2025 13:24:54 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: [peterz-queue:sched/hrtick] [entry,hrtimer,x86] ebf33ab570:
BUG:soft_lockup-CPU##stuck_for#s![pthread_mutex1_:#]
Hello,
kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![pthread_mutex1_:#]" on:
commit: ebf33ab5707c7c9ea25e3c03540b1329ad9aff1d ("entry,hrtimer,x86: Push reprogramming timers into the interrupt return path")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/hrtick
in testcase: will-it-scale
version: will-it-scale-x86_64-75f66e4-1_20250906
with following parameters:
nr_task: 100%
mode: thread
test: pthread_mutex1
cpufreq_governor: performance
config: x86_64-rhel-9.4
compiler: gcc-13
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202509081329.81f1ed82-lkp@intel.com
[ 138.658008][ C24] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [pthread_mutex1_:6650]
[ 138.658013][ C24] Modules linked in: ipmi_ssif intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel sd_mod sg binfmt_misc btrfs kvm blake2b_generic irqbypass snd_pcm xor ghash_clmulni_intel dax_hmem rapl raid6_pq snd_timer cxl_acpi ahci intel_cstate ast cxl_port snd drm_client_lib libahci nvme drm_shmem_helper cxl_core intel_th_gth soundcore mei_me isst_if_mmio isst_if_mbox_pci acpi_power_meter ioatdma i2c_i801 libata intel_uncore nvme_core intel_th_pci megaraid_sas einj pcspkr drm_kms_helper mei ipmi_si isst_if_common i2c_smbus acpi_ipmi intel_pch_thermal intel_vsec intel_th dca wmi ipmi_devintf ipmi_msghandler joydev drm fuse nfnetlink
[ 138.658060][ C24] CPU: 24 UID: 0 PID: 6650 Comm: pthread_mutex1_ Not tainted 6.17.0-rc4-00007-gebf33ab5707c #1 VOLUNTARY
[ 138.658063][ C24] Hardware name: Inspur NF5180M6/NF5180M6, BIOS 06.00.04 04/12/2022
[ 138.658065][ C24] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:291 (discriminator 3))
[ 138.658077][ C24] Code: c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 80 20 e5 83 48 03 04 cd 20 3e bc 82 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 48 8b 0a 48 85 c9 74 81 0f 0d 09 e9 79 ff ff
All code
========
0: c1 e9 12 shr $0x12,%ecx
3: 83 e0 03 and $0x3,%eax
6: 83 e9 01 sub $0x1,%ecx
9: 48 c1 e0 05 shl $0x5,%rax
d: 48 63 c9 movslq %ecx,%rcx
10: 48 05 80 20 e5 83 add $0xffffffff83e52080,%rax
16: 48 03 04 cd 20 3e bc add -0x7d43c1e0(,%rcx,8),%rax
1d: 82
1e: 48 89 10 mov %rdx,(%rax)
21: 8b 42 08 mov 0x8(%rdx),%eax
24: 85 c0 test %eax,%eax
26: 75 09 jne 0x31
28: f3 90 pause
2a:* 8b 42 08 mov 0x8(%rdx),%eax <-- trapping instruction
2d: 85 c0 test %eax,%eax
2f: 74 f7 je 0x28
31: 48 8b 0a mov (%rdx),%rcx
34: 48 85 c9 test %rcx,%rcx
37: 74 81 je 0xffffffffffffffba
39: 0f 0d 09 prefetchw (%rcx)
3c: e9 .byte 0xe9
3d: 79 ff jns 0x3e
3f: ff .byte 0xff
Code starting with the faulting instruction
===========================================
0: 8b 42 08 mov 0x8(%rdx),%eax
3: 85 c0 test %eax,%eax
5: 74 f7 je 0xfffffffffffffffe
7: 48 8b 0a mov (%rdx),%rcx
a: 48 85 c9 test %rcx,%rcx
d: 74 81 je 0xffffffffffffff90
f: 0f 0d 09 prefetchw (%rcx)
12: e9 .byte 0xe9
13: 79 ff jns 0x14
15: ff .byte 0xff
[ 138.658079][ C24] RSP: 0018:ffa0000028f0fbe0 EFLAGS: 00000246
[ 138.658082][ C24] RAX: 0000000000000000 RBX: ff110002532e1204 RCX: 000000000000001b
[ 138.658083][ C24] RDX: ff11003fba032080 RSI: 0000000000700001 RDI: ff110002532e1204
[ 138.658085][ C24] RBP: ff11003fba032080 R08: 0000000000001200 R09: 00000000aba99bcc
[ 138.658086][ C24] R10: 0000000055565000 R11: 0000000001e15159 R12: 0000000000640000
[ 138.658088][ C24] R13: 0000000000640000 R14: ff11004075be6000 R15: 0000000000000000
[ 138.658089][ C24] FS: 00007fff967fc6c0(0000) GS:ff110040361e0000(0000) knlGS:0000000000000000
[ 138.658091][ C24] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 138.658093][ C24] CR2: 0000000000479ea0 CR3: 00000040483ec005 CR4: 0000000000773ef0
[ 138.658094][ C24] PKRU: 55555554
[ 138.658095][ C24] Call Trace:
[ 138.658098][ C24] <TASK>
[ 138.658101][ C24] _raw_spin_lock (arch/x86/include/asm/paravirt.h:557 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:114 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 138.658104][ C24] futex_wait_setup (include/linux/uaccess.h:244 (discriminator 1) include/linux/uaccess.h:261 (discriminator 1) kernel/futex/futex.h:336 (discriminator 1) kernel/futex/waitwake.c:627 (discriminator 1))
[ 138.658111][ C24] __futex_wait (kernel/futex/waitwake.c:683)
[ 138.658114][ C24] ? __pfx_futex_wake_mark (kernel/futex/waitwake.c:135)
[ 138.658117][ C24] futex_wait (kernel/futex/waitwake.c:715)
[ 138.658119][ C24] ? do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 138.658127][ C24] ? do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 138.658130][ C24] do_futex (kernel/futex/syscalls.c:102 (discriminator 1))
[ 138.658132][ C24] __x64_sys_futex (kernel/futex/syscalls.c:179 kernel/futex/syscalls.c:160 kernel/futex/syscalls.c:160)
[ 138.658134][ C24] ? futex_wake (kernel/futex/waitwake.c:163)
[ 138.658138][ C24] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 138.658141][ C24] ? do_futex (kernel/futex/syscalls.c:107 (discriminator 1))
[ 138.658142][ C24] ? __x64_sys_futex (kernel/futex/syscalls.c:179 kernel/futex/syscalls.c:160 kernel/futex/syscalls.c:160)
[ 138.658144][ C24] ? do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 138.658146][ C24] ? do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 138.658149][ C24] ? do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 138.658151][ C24] ? clear_bhb_loop (arch/x86/entry/entry_64.S:1548)
[ 138.658158][ C24] ? clear_bhb_loop (arch/x86/entry/entry_64.S:1548)
[ 138.658160][ C24] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 138.658162][ C24] RIP: 0033:0x7ffff7de9eab
[ 138.658165][ C24] Code: 07 41 89 f0 83 f8 02 74 0b b8 02 00 00 00 87 07 85 c0 74 3b 44 89 c6 45 31 d2 ba 02 00 00 00 b8 ca 00 00 00 40 80 f6 80 0f 05 <48> 3d 00 f0 ff ff 76 d7 83 f8 f5 74 d2 83 f8 fc 74 cd 50 48 8d 3d
All code
========
0: 07 (bad)
1: 41 89 f0 mov %esi,%r8d
4: 83 f8 02 cmp $0x2,%eax
7: 74 0b je 0x14
9: b8 02 00 00 00 mov $0x2,%eax
e: 87 07 xchg %eax,(%rdi)
10: 85 c0 test %eax,%eax
12: 74 3b je 0x4f
14: 44 89 c6 mov %r8d,%esi
17: 45 31 d2 xor %r10d,%r10d
1a: ba 02 00 00 00 mov $0x2,%edx
1f: b8 ca 00 00 00 mov $0xca,%eax
24: 40 80 f6 80 xor $0x80,%sil
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 76 d7 jbe 0x9
32: 83 f8 f5 cmp $0xfffffff5,%eax
35: 74 d2 je 0x9
37: 83 f8 fc cmp $0xfffffffc,%eax
3a: 74 cd je 0x9
3c: 50 push %rax
3d: 48 rex.W
3e: 8d .byte 0x8d
3f: 3d .byte 0x3d
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 76 d7 jbe 0xffffffffffffffdf
8: 83 f8 f5 cmp $0xfffffff5,%eax
b: 74 d2 je 0xffffffffffffffdf
d: 83 f8 fc cmp $0xfffffffc,%eax
10: 74 cd je 0xffffffffffffffdf
12: 50 push %rax
13: 48 rex.W
14: 8d .byte 0x8d
15: 3d .byte 0x3d
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250908/202509081329.81f1ed82-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists