lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202401031025.95761451-oliver.sang@intel.com>
Date: Wed, 3 Jan 2024 10:55:18 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Lai Jiangshan <jiangshanlai@...il.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Tejun Heo <tj@...nel.org>, <Naohiro.Aota@....com>, Lai Jiangshan
	<jiangshan.ljs@...group.com>, Lai Jiangshan <jiangshanlai@...il.com>,
	<oliver.sang@...el.com>
Subject: Re: [PATCH 2/7] workqueue: Share the same PWQ for the CPUs of a pod



Hello,

kernel test robot noticed "WARNING:at_kernel/workqueue.c:#destroy_workqueue" on:

commit: 3f033de3cf87ef6c769b2d55ee1df715a982d650 ("[PATCH 2/7] workqueue: Share the same PWQ for the CPUs of a pod")
url: https://github.com/intel-lab-lkp/linux/commits/Lai-Jiangshan/workqueue-Reuse-the-default-PWQ-as-much-as-possible/20231227-225337
base: https://git.kernel.org/cgit/linux/kernel/git/tj/wq.git for-next
patch link: https://lore.kernel.org/all/20231227145143.2399-3-jiangshanlai@gmail.com/
patch subject: [PATCH 2/7] workqueue: Share the same PWQ for the CPUs of a pod

in testcase: hackbench
version: hackbench-x86_64-2.3-1_20220518
with following parameters:

	nr_threads: 800%
	iterations: 4
	mode: threads
	ipc: pipe
	cpufreq_governor: performance



compiler: gcc-12
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202401031025.95761451-oliver.sang@intel.com


[   30.471685][    T1] ------------[ cut here ]------------
[ 30.476998][ T1] WARNING: CPU: 111 PID: 1 at kernel/workqueue.c:4842 destroy_workqueue (kernel/workqueue.c:4842 (discriminator 1)) 
[   30.486210][    T1] Modules linked in:
[   30.489964][    T1] CPU: 111 PID: 1 Comm: swapper/0 Not tainted 6.6.0-15761-g3f033de3cf87 #1
[   30.498396][    T1] Hardware name: Inspur NF8260M6/NF8260M6, BIOS 06.00.01 04/22/2022
[ 30.506220][ T1] RIP: 0010:destroy_workqueue (kernel/workqueue.c:4842 (discriminator 1)) 
[ 30.511794][ T1] Code: c2 75 f1 48 8b 43 08 48 39 98 a0 00 00 00 74 06 83 7b 18 01 7f 14 8b 43 5c 85 c0 75 0d 48 8b 53 68 48 8d 43 68 48 39 c2 74 4e <0f> 0b 48 c7 c6 e0 1d 42 82 48 8d 95 b0 00 00 00 48 c7 c7 68 a9 93
All code
========
   0:	c2 75 f1             	retq   $0xf175
   3:	48 8b 43 08          	mov    0x8(%rbx),%rax
   7:	48 39 98 a0 00 00 00 	cmp    %rbx,0xa0(%rax)
   e:	74 06                	je     0x16
  10:	83 7b 18 01          	cmpl   $0x1,0x18(%rbx)
  14:	7f 14                	jg     0x2a
  16:	8b 43 5c             	mov    0x5c(%rbx),%eax
  19:	85 c0                	test   %eax,%eax
  1b:	75 0d                	jne    0x2a
  1d:	48 8b 53 68          	mov    0x68(%rbx),%rdx
  21:	48 8d 43 68          	lea    0x68(%rbx),%rax
  25:	48 39 c2             	cmp    %rax,%rdx
  28:	74 4e                	je     0x78
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	48 c7 c6 e0 1d 42 82 	mov    $0xffffffff82421de0,%rsi
  33:	48 8d 95 b0 00 00 00 	lea    0xb0(%rbp),%rdx
  3a:	48                   	rex.W
  3b:	c7                   	.byte 0xc7
  3c:	c7                   	(bad)  
  3d:	68                   	.byte 0x68
  3e:	a9                   	.byte 0xa9
  3f:	93                   	xchg   %eax,%ebx

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	48 c7 c6 e0 1d 42 82 	mov    $0xffffffff82421de0,%rsi
   9:	48 8d 95 b0 00 00 00 	lea    0xb0(%rbp),%rdx
  10:	48                   	rex.W
  11:	c7                   	.byte 0xc7
  12:	c7                   	(bad)  
  13:	68                   	.byte 0x68
  14:	a9                   	.byte 0xa9
  15:	93                   	xchg   %eax,%ebx
[   30.531233][    T1] RSP: 0000:ffffc90000073dd8 EFLAGS: 00010002
[   30.537151][    T1] RAX: ffff88a444cd1000 RBX: ffff88a444ce6600 RCX: 0000000000000000
[   30.544968][    T1] RDX: ffff88a444ce665c RSI: 0000000000000286 RDI: ffff88a4444c4000
[   30.552785][    T1] RBP: ffff88a444cd1000 R08: 0004afcaac775f46 R09: 0004afcaac775f46
[   30.560605][    T1] R10: ffff88984f050840 R11: 0000000000008070 R12: ffff88a444cd1020
[   30.568430][    T1] R13: ffffc90000073e00 R14: 0000000000000462 R15: 0000000000000000
[   30.576246][    T1] FS:  0000000000000000(0000) GS:ffff88afcf8c0000(0000) knlGS:0000000000000000
[   30.585017][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.591447][    T1] CR2: 0000000000000000 CR3: 000000303e01c001 CR4: 00000000007706f0
[   30.599266][    T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   30.607085][    T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   30.614910][    T1] PKRU: 55555554
[   30.618314][    T1] Call Trace:
[   30.621453][    T1]  <TASK>
[ 30.624242][ T1] ? destroy_workqueue (kernel/workqueue.c:4842 (discriminator 1)) 
[ 30.629201][ T1] ? __warn (kernel/panic.c:677) 
[ 30.633129][ T1] ? destroy_workqueue (kernel/workqueue.c:4842 (discriminator 1)) 
[ 30.638091][ T1] ? report_bug (lib/bug.c:180 lib/bug.c:219) 
[ 30.642454][ T1] ? handle_bug (arch/x86/kernel/traps.c:237) 
[ 30.646639][ T1] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1)) 
[ 30.651171][ T1] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) 
[ 30.656049][ T1] ? destroy_workqueue (kernel/workqueue.c:4842 (discriminator 1)) 
[ 30.661009][ T1] ? destroy_workqueue (kernel/workqueue.c:4783 kernel/workqueue.c:4842) 
[ 30.665888][ T1] ? __pfx_ftrace_check_sync (kernel/trace/ftrace.c:3803) 
[ 30.671200][ T1] ftrace_check_sync (kernel/trace/ftrace.c:3808) 
[ 30.675820][ T1] do_one_initcall (init/main.c:1236) 
[ 30.680354][ T1] do_initcalls (init/main.c:1297 init/main.c:1314) 
[ 30.684625][ T1] kernel_init_freeable (init/main.c:1555) 
[ 30.689678][ T1] ? __pfx_kernel_init (init/main.c:1433) 
[ 30.694471][ T1] kernel_init (init/main.c:1443) 
[ 30.698658][ T1] ret_from_fork (arch/x86/kernel/process.c:147) 
[ 30.702927][ T1] ? __pfx_kernel_init (init/main.c:1433) 
[ 30.707713][ T1] ret_from_fork_asm (arch/x86/entry/entry_64.S:250) 
[   30.712333][    T1]  </TASK>
[   30.715217][    T1] ---[ end trace 0000000000000000 ]---
[   30.720522][    T1] destroy_workqueue: ftrace_check_wq has the following busy pwq
[   30.728002][    T1]   pwq 452: cpus=0-223 node=3 flags=0x4 nice=0 active=0/256 refcnt=56


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240103/202401031025.95761451-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ