lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202503280925.27fefb28-lkp@intel.com>
Date: Fri, 28 Mar 2025 09:24:05 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<oliver.sang@...el.com>
Subject: [peterz-queue:sched/hrtick] [entry,hrtimer,x86] c07c4e0c01:
 BUG:soft_lockup-CPU##stuck_for#s![schbench:#]



Hello,

kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![schbench:#]" on:

commit: c07c4e0c013dc11dd466fa63a4af12ef8282b27b ("entry,hrtimer,x86: Push reprogramming timers into the interrupt return path")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/hrtick

in testcase: schbench
version: schbench-x86_64-48aed1d-1_20241103
with following parameters:

	iterations: 3x
	message_threads: 10%
	worker_threads: 128
	runtime: 300s
	cpufreq_governor: performance



config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202503280925.27fefb28-lkp@intel.com


[  120.056174][   C17] watchdog: BUG: soft lockup - CPU#17 stuck for 22s! [schbench:4939]
[  120.056179][   C17] Modules linked in: kmem intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common device_dax nd_pmem nd_btt dax_pmem i10nm_edac skx_edac_common x86_pkg_temp_thermal intel_powerclamp coretemp btrfs blake2b_generic xor raid6_pq sd_mod kvm_intel sg kvm snd_pcm ast snd_timer dax_hmem ghash_clmulni_intel rapl drm_client_lib ahci cxl_acpi snd ipmi_ssif drm_shmem_helper intel_cstate isst_if_mmio isst_if_mbox_pci acpi_power_meter cxl_port libahci binfmt_misc intel_th_gth cxl_core mei_me soundcore ipmi_si ioatdma i2c_i801 intel_th_pci intel_uncore einj acpi_ipmi pcspkr libata mei isst_if_common drm_kms_helper i2c_smbus intel_pch_thermal intel_vsec intel_th dca wmi nfit ipmi_devintf libnvdimm ipmi_msghandler acpi_pad joydev drm fuse dm_mod loop ip_tables
[  120.056218][   C17] CPU: 17 UID: 0 PID: 4939 Comm: schbench Tainted: G S                 6.14.0-01502-gc07c4e0c013d #1 VOLUNTARY
[  120.056221][   C17] Tainted: [S]=CPU_OUT_OF_SPEC
[  120.056222][   C17] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
[ 120.056223][ C17] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:474) 
[ 120.056234][ C17] Code: c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 80 2b e5 83 48 03 04 cd e0 cc bc 82 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 48 8b 0a 48 85 c9 74 90 0f 0d 09 eb 91 8b 03
All code
========
   0:	c1 e9 12             	shr    $0x12,%ecx
   3:	83 e0 03             	and    $0x3,%eax
   6:	83 e9 01             	sub    $0x1,%ecx
   9:	48 c1 e0 05          	shl    $0x5,%rax
   d:	48 63 c9             	movslq %ecx,%rcx
  10:	48 05 80 2b e5 83    	add    $0xffffffff83e52b80,%rax
  16:	48 03 04 cd e0 cc bc 	add    -0x7d433320(,%rcx,8),%rax
  1d:	82 
  1e:	48 89 10             	mov    %rdx,(%rax)
  21:	8b 42 08             	mov    0x8(%rdx),%eax
  24:	85 c0                	test   %eax,%eax
  26:	75 09                	jne    0x31
  28:	f3 90                	pause
  2a:*	8b 42 08             	mov    0x8(%rdx),%eax		<-- trapping instruction
  2d:	85 c0                	test   %eax,%eax
  2f:	74 f7                	je     0x28
  31:	48 8b 0a             	mov    (%rdx),%rcx
  34:	48 85 c9             	test   %rcx,%rcx
  37:	74 90                	je     0xffffffffffffffc9
  39:	0f 0d 09             	prefetchw (%rcx)
  3c:	eb 91                	jmp    0xffffffffffffffcf
  3e:	8b 03                	mov    (%rbx),%eax

Code starting with the faulting instruction
===========================================
   0:	8b 42 08             	mov    0x8(%rdx),%eax
   3:	85 c0                	test   %eax,%eax
   5:	74 f7                	je     0xfffffffffffffffe
   7:	48 8b 0a             	mov    (%rdx),%rcx
   a:	48 85 c9             	test   %rcx,%rcx
   d:	74 90                	je     0xffffffffffffff9f
   f:	0f 0d 09             	prefetchw (%rcx)
  12:	eb 91                	jmp    0xffffffffffffffa5
  14:	8b 03                	mov    (%rbx),%eax
[  120.056236][   C17] RSP: 0000:ffa00000222dfd68 EFLAGS: 00000246
[  120.056238][   C17] RAX: 0000000000000000 RBX: ffd40000055f6568 RCX: 000000000000002a
[  120.056239][   C17] RDX: ff1100103f671b80 RSI: 0000000000ac0101 RDI: ffd40000055f6568
[  120.056241][   C17] RBP: ff1100103f671b80 R08: 0000000000000000 R09: 0000000000000000
[  120.056242][   C17] R10: 0000000055555554 R11: ff11000240ff850c R12: 0000000000480000
[  120.056242][   C17] R13: 0000000000480000 R14: 0200000000000000 R15: 0000000000000000
[  120.056243][   C17] FS:  00007f75844266c0(0000) GS:ff110010bb81f000(0000) knlGS:0000000000000000
[  120.056245][   C17] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  120.056246][   C17] CR2: 00007f76e0415c70 CR3: 00000001f83fc002 CR4: 0000000000773ef0
[  120.056247][   C17] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  120.056247][   C17] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  120.056248][   C17] PKRU: 55555554
[  120.056249][   C17] Call Trace:
[  120.056250][   C17]  <TASK>
[ 120.056252][ C17] _raw_spin_lock (arch/x86/include/asm/paravirt.h:572 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:114 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) 
[ 120.056254][ C17] do_huge_pmd_numa_page (mm/huge_memory.c:1976) 
[ 120.056259][ C17] __handle_mm_fault (mm/memory.c:6014) 
[ 120.056264][ C17] handle_mm_fault (mm/memory.c:6197) 
[ 120.056266][ C17] do_user_addr_fault (arch/x86/mm/fault.c:1338) 
[ 120.056272][ C17] exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1488 arch/x86/mm/fault.c:1538) 
[ 120.056275][ C17] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623) 
[  120.056278][   C17] RIP: 0033:0x55f6cc692d8b
[ 120.056280][ C17] Code: e3 ff ff 8b 05 86 82 00 00 85 c0 0f 84 f7 02 00 00 48 8b 15 b7 33 00 00 31 db 48 85 d2 0f 84 30 01 00 00 4c 8b 15 55 82 00 00 <4d> 8b b7 70 98 10 00 4d 89 d5 4e 8d 1c d5 00 00 00 00 4d 0f af ea
All code
========
   0:	e3 ff                	jrcxz  0x1
   2:	ff 8b 05 86 82 00    	decl   0x828605(%rbx)
   8:	00 85 c0 0f 84 f7    	add    %al,-0x87bf040(%rbp)
   e:	02 00                	add    (%rax),%al
  10:	00 48 8b             	add    %cl,-0x75(%rax)
  13:	15 b7 33 00 00       	adc    $0x33b7,%eax
  18:	31 db                	xor    %ebx,%ebx
  1a:	48 85 d2             	test   %rdx,%rdx
  1d:	0f 84 30 01 00 00    	je     0x153
  23:	4c 8b 15 55 82 00 00 	mov    0x8255(%rip),%r10        # 0x827f
  2a:*	4d 8b b7 70 98 10 00 	mov    0x109870(%r15),%r14		<-- trapping instruction
  31:	4d 89 d5             	mov    %r10,%r13
  34:	4e 8d 1c d5 00 00 00 	lea    0x0(,%r10,8),%r11
  3b:	00 
  3c:	4d 0f af ea          	imul   %r10,%r13

Code starting with the faulting instruction
===========================================
   0:	4d 8b b7 70 98 10 00 	mov    0x109870(%r15),%r14
   7:	4d 89 d5             	mov    %r10,%r13
   a:	4e 8d 1c d5 00 00 00 	lea    0x0(,%r10,8),%r11
  11:	00 
  12:	4d 0f af ea          	imul   %r10,%r13
[  120.056281][   C17] RSP: 002b:00007f7584425df0 EFLAGS: 00010206
[  120.056282][   C17] RAX: 0000000000000000 RBX: 000055f6e61615d0 RCX: 0000000000000000
[  120.056283][   C17] RDX: 0000000000000005 RSI: 0000000000000000 RDI: 000055f6e61615d0
[  120.056284][   C17] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  120.056284][   C17] R10: 0000000000000068 R11: 0000000000000293 R12: 00007f76e0315c70
[  120.056285][   C17] R13: 0000000000000011 R14: 00007f76e030c420 R15: 00007f76e030c400
[  120.056287][   C17]  </TASK>
[  120.056288][   C17] Kernel panic - not syncing: softlockup: hung tasks
[  120.410327][   C17] CPU: 17 UID: 0 PID: 4939 Comm: schbench Tainted: G S           L     6.14.0-01502-gc07c4e0c013d #1 VOLUNTARY
[  120.422640][   C17] Tainted: [S]=CPU_OUT_OF_SPEC, [L]=SOFTLOCKUP
[  120.428974][   C17] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
[  120.441111][   C17] Call Trace:
[  120.444577][   C17]  <IRQ>
[ 120.447593][ C17] panic (kernel/panic.c:354) 
[ 120.451654][ C17] watchdog_timer_fn (kernel/watchdog.c:733) 
[ 120.456739][ C17] ? __pfx_watchdog_timer_fn (kernel/watchdog.c:683) 
[ 120.462344][ C17] __hrtimer_run_queues (kernel/time/hrtimer.c:1799 kernel/time/hrtimer.c:1863) 
[ 120.467684][ C17] hrtimer_interrupt (kernel/time/hrtimer.c:1960) 
[ 120.472753][ C17] __sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1038 arch/x86/kernel/apic/apic.c:1055) 
[ 120.478688][ C17] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049) 
[  120.484437][   C17]  </IRQ>
[  120.487494][   C17]  <TASK>
[ 120.490535][ C17] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702) 
[ 120.496622][ C17] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:474) 
[ 120.503754][ C17] Code: c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 80 2b e5 83 48 03 04 cd e0 cc bc 82 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 48 8b 0a 48 85 c9 74 90 0f 0d 09 eb 91 8b 03
All code
========
   0:	c1 e9 12             	shr    $0x12,%ecx
   3:	83 e0 03             	and    $0x3,%eax
   6:	83 e9 01             	sub    $0x1,%ecx
   9:	48 c1 e0 05          	shl    $0x5,%rax
   d:	48 63 c9             	movslq %ecx,%rcx
  10:	48 05 80 2b e5 83    	add    $0xffffffff83e52b80,%rax
  16:	48 03 04 cd e0 cc bc 	add    -0x7d433320(,%rcx,8),%rax
  1d:	82 
  1e:	48 89 10             	mov    %rdx,(%rax)
  21:	8b 42 08             	mov    0x8(%rdx),%eax
  24:	85 c0                	test   %eax,%eax
  26:	75 09                	jne    0x31
  28:	f3 90                	pause
  2a:*	8b 42 08             	mov    0x8(%rdx),%eax		<-- trapping instruction
  2d:	85 c0                	test   %eax,%eax
  2f:	74 f7                	je     0x28
  31:	48 8b 0a             	mov    (%rdx),%rcx
  34:	48 85 c9             	test   %rcx,%rcx
  37:	74 90                	je     0xffffffffffffffc9
  39:	0f 0d 09             	prefetchw (%rcx)
  3c:	eb 91                	jmp    0xffffffffffffffcf
  3e:	8b 03                	mov    (%rbx),%eax

Code starting with the faulting instruction
===========================================
   0:	8b 42 08             	mov    0x8(%rdx),%eax
   3:	85 c0                	test   %eax,%eax
   5:	74 f7                	je     0xfffffffffffffffe
   7:	48 8b 0a             	mov    (%rdx),%rcx
   a:	48 85 c9             	test   %rcx,%rcx
   d:	74 90                	je     0xffffffffffffff9f
   f:	0f 0d 09             	prefetchw (%rcx)
  12:	eb 91                	jmp    0xffffffffffffffa5
  14:	8b 03                	mov    (%rbx),%eax


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250328/202503280925.27fefb28-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ