lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 15 Mar 2021 22:04:41 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     0day robot <lkp@...el.com>, LKML <linux-kernel@...r.kernel.org>,
        lkp@...ts.01.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Laurence Oberman <loberman@...hat.com>,
        Vincent Whitchurch <vincent.whitchurch@...s.com>,
        Michal Hocko <mhocko@...e.com>, Petr Mladek <pmladek@...e.com>
Subject: c928e9b143: BUG:soft_lockup-CPU##stuck_for#s![perf:#]



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: c928e9b1439de4d74b942abd30d5c838a40af777 ("[PATCH v2 7/7] Test softlockup")
url: https://github.com/0day-ci/linux/commits/Petr-Mladek/watchdog-softlockup-Report-overall-time-and-some-cleanup/20210311-205501
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git a74e6a014c9d4d4161061f770c9b4f98372ac778

in testcase: stress-ng
version: stress-ng-x86_64-0.11-06_20210314
with following parameters:

	nr_threads: 10%
	disk: 1HDD
	testtime: 60s
	fs: ext4
	class: filesystem
	test: binderfs
	cpufreq_governor: performance
	ucode: 0x42e



on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


[   70.666742] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [perf:1794]
[   70.675062] Modules linked in: dm_mod xfs libcrc32c sd_mod t10_pi sg intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul crc32_pclmul crc32c_intel drm_kms_helper ghash_clmulni_intel isci syscopyarea sysfillrect rapl sysimgblt libsas fb_sys_fops ahci intel_cstate ipmi_si scsi_transport_sas libahci mei_me ipmi_devintf ioatdma drm intel_uncore ipmi_msghandler libata mei joydev dca wmi ip_tables
[   70.725024] CPU: 8 PID: 1794 Comm: perf Not tainted 5.12.0-rc2-00303-gc928e9b1439d #1
[   70.734501] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   70.746688] RIP: 0010:version_proc_show (kbuild/src/consumer/fs/proc/version.c:15) 
[ 70.752690] Code: c3 0f 1f 44 00 00 55 48 c7 c6 00 dc 24 82 48 89 fd 48 c7 c7 a8 ed 57 82 e8 af 5d ff ff c6 05 90 60 ba 01 01 8a 05 8a 60 ba 01 <84> c0 74 04 f3 90 eb f2 65 48 8b 04 25 00 6f 01 00 48 8b 80 98 0b
All code
========
   0:	c3                   	retq   
   1:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   6:	55                   	push   %rbp
   7:	48 c7 c6 00 dc 24 82 	mov    $0xffffffff8224dc00,%rsi
   e:	48 89 fd             	mov    %rdi,%rbp
  11:	48 c7 c7 a8 ed 57 82 	mov    $0xffffffff8257eda8,%rdi
  18:	e8 af 5d ff ff       	callq  0xffffffffffff5dcc
  1d:	c6 05 90 60 ba 01 01 	movb   $0x1,0x1ba6090(%rip)        # 0x1ba60b4
  24:	8a 05 8a 60 ba 01    	mov    0x1ba608a(%rip),%al        # 0x1ba60b4
  2a:*	84 c0                	test   %al,%al		<-- trapping instruction
  2c:	74 04                	je     0x32
  2e:	f3 90                	pause  
  30:	eb f2                	jmp    0x24
  32:	65 48 8b 04 25 00 6f 	mov    %gs:0x16f00,%rax
  39:	01 00 
  3b:	48                   	rex.W
  3c:	8b                   	.byte 0x8b
  3d:	80                   	.byte 0x80
  3e:	98                   	cwtl   
  3f:	0b                   	.byte 0xb

Code starting with the faulting instruction
===========================================
   0:	84 c0                	test   %al,%al
   2:	74 04                	je     0x8
   4:	f3 90                	pause  
   6:	eb f2                	jmp    0xfffffffffffffffa
   8:	65 48 8b 04 25 00 6f 	mov    %gs:0x16f00,%rax
   f:	01 00 
  11:	48                   	rex.W
  12:	8b                   	.byte 0x8b
  13:	80                   	.byte 0x80
  14:	98                   	cwtl   
  15:	0b                   	.byte 0xb
[   70.775248] RSP: 0018:ffffc9000b84bdd0 EFLAGS: 00000202
[   70.781821] RAX: 0000000000000001 RBX: ffff888111af7ca8 RCX: 0000000000000000
[   70.790549] RDX: 0000000000000000 RSI: ffff888f02a177f0 RDI: ffff888f02a177f0
[   70.799294] RBP: ffff888111af7ca8 R08: ffff888f02a177f0 R09: ffffc9000b84bbf0
[   70.808019] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000b84be88
[   70.816751] R13: ffffc9000b84be60 R14: ffff888111af7cd0 R15: 0000000000000001
[   70.825493] FS:  00007f3b1f0397c0(0000) GS:ffff888f02a00000(0000) knlGS:0000000000000000
[   70.835310] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.842505] CR2: 00005650280be178 CR3: 0000000194ba2004 CR4: 00000000001706e0
[   70.851261] Call Trace:
[   70.854739] seq_read_iter (kbuild/src/consumer/fs/seq_file.c:227) 
[   70.859674] proc_reg_read_iter (kbuild/src/consumer/fs/proc/inode.c:311) 
[   70.864901] new_sync_read (kbuild/src/consumer/fs/read_write.c:416 (discriminator 1)) 
[   70.869838] vfs_read (kbuild/src/consumer/fs/read_write.c:496) 
[   70.874326] ksys_read (kbuild/src/consumer/fs/read_write.c:634) 
[   70.878653] do_syscall_64 (kbuild/src/consumer/arch/x86/entry/common.c:46) 
[   70.883389] entry_SYSCALL_64_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64.S:112) 
[   70.889715] RIP: 0033:0x7f3b1fcd5461
[ 70.894432] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
All code
========
   0:	fe                   	(bad)  
   1:	ff                   	(bad)  
   2:	ff 50 48             	callq  *0x48(%rax)
   5:	8d 3d fe d0 09 00    	lea    0x9d0fe(%rip),%edi        # 0x9d109
   b:	e8 e9 03 02 00       	callq  0x203f9
  10:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  17:	00 00 
  19:	48 8d 05 99 62 0d 00 	lea    0xd6299(%rip),%rax        # 0xd62b9
  20:	8b 00                	mov    (%rax),%eax
  22:	85 c0                	test   %eax,%eax
  24:	75 13                	jne    0x39
  26:	31 c0                	xor    %eax,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 57                	ja     0x89
  32:	c3                   	retq   
  33:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  39:	41 54                	push   %r12
  3b:	49 89 d4             	mov    %rdx,%r12
  3e:	55                   	push   %rbp
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 57                	ja     0x5f
   8:	c3                   	retq   
   9:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
   f:	41 54                	push   %r12
  11:	49 89 d4             	mov    %rdx,%r12
  14:	55                   	push   %rbp
  15:	48                   	rex.W
[   70.916824] RSP: 002b:00007fffa9898db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   70.922741] watchdog: BUG: soft lockup - CPU#27 stuck for 27s! [perf:1882]
[   70.925999] RAX: ffffffffffffffda RBX: 0000565027ffc970 RCX: 00007f3b1fcd5461
[   70.934426] Modules linked in: dm_mod
[   70.943145] RDX: 0000000000000400 RSI: 0000565027ffcc20 RDI: 0000000000000004
[   70.943147] RBP: 0000000000000d68 R08: 0000000000000001 R09: 0000000000000000
[   70.947963]  xfs
[   70.956663] R10: 00007f3b1f0397c0 R11: 0000000000000246 R12: 00007f3b1fda2760
[   70.965389]  libcrc32c
[   70.968210] R13: 00007f3b1fda32a0 R14: 0000000000000fff R15: 0000565027ffc970
[   70.976908]  sd_mod
[   70.980313] Kernel panic - not syncing: softlockup: hung tasks
[   70.989033]  t10_pi
[   70.992181] CPU: 8 PID: 1794 Comm: perf Tainted: G             L    5.12.0-rc2-00303-gc928e9b1439d #1
[   70.999457]  sg
[   71.002521] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[   71.013538]  intel_rapl_msr
[   71.016248] Call Trace:
[   71.028495]  intel_rapl_common
[   71.032402]  <IRQ>
[   71.035857]  sb_edac
[   71.039990] dump_stack (kbuild/src/consumer/lib/dump_stack.c:122) 
[   71.042930]  x86_pkg_temp_thermal
[   71.046073] panic (kbuild/src/consumer/kernel/panic.c:249) 
[   71.050431]  intel_powerclamp
[   71.054752] watchdog_timer_fn.cold (kbuild/src/consumer/kernel/watchdog.c:433) 
[   71.058764]  coretemp
[   71.062645] ? softlockup_fn (kbuild/src/consumer/kernel/watchdog.c:354) 
[   71.068018]  kvm_intel
[   71.071163] __hrtimer_run_queues (kbuild/src/consumer/kernel/time/hrtimer.c:1519 kbuild/src/consumer/kernel/time/hrtimer.c:1583) 
[   71.075919]  kvm
[   71.079161] hrtimer_interrupt (kbuild/src/consumer/kernel/time/hrtimer.c:1648) 
[   71.084497]  irqbypass
[   71.087153] __sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:25 kbuild/src/consumer/include/linux/jump_label.h:200 kbuild/src/consumer/arch/x86/include/asm/trace/irq_vectors.h:41 kbuild/src/consumer/arch/x86/kernel/apic/apic.c:1107) 
[   71.092326]  mgag200
[   71.095499] sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/kernel/apic/apic.c:1100 (discriminator 14)) 
[   71.101652]  crct10dif_pclmul
[   71.104609]  </IRQ>
[   71.110501]  crc32_pclmul
[   71.114365] asm_sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/include/asm/idtentry.h:632) 
[   71.117264]  crc32c_intel
[   71.120677] RIP: 0010:version_proc_show (kbuild/src/consumer/fs/proc/version.c:15) 
[   71.126951]  drm_kms_helper
[ 71.130406] Code: c3 0f 1f 44 00 00 55 48 c7 c6 00 dc 24 82 48 89 fd 48 c7 c7 a8 ed 57 82 e8 af 5d ff ff c6 05 90 60 ba 01 01 8a 05 8a 60 ba 01 <84> c0 74 04 f3 90 eb f2 65 48 8b 04 25 00 6f 01 00 48 8b 80 98 0b
All code
========
   0:	c3                   	retq   
   1:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   6:	55                   	push   %rbp
   7:	48 c7 c6 00 dc 24 82 	mov    $0xffffffff8224dc00,%rsi
   e:	48 89 fd             	mov    %rdi,%rbp
  11:	48 c7 c7 a8 ed 57 82 	mov    $0xffffffff8257eda8,%rdi
  18:	e8 af 5d ff ff       	callq  0xffffffffffff5dcc
  1d:	c6 05 90 60 ba 01 01 	movb   $0x1,0x1ba6090(%rip)        # 0x1ba60b4
  24:	8a 05 8a 60 ba 01    	mov    0x1ba608a(%rip),%al        # 0x1ba60b4
  2a:*	84 c0                	test   %al,%al		<-- trapping instruction
  2c:	74 04                	je     0x32
  2e:	f3 90                	pause  
  30:	eb f2                	jmp    0x24
  32:	65 48 8b 04 25 00 6f 	mov    %gs:0x16f00,%rax
  39:	01 00 
  3b:	48                   	rex.W
  3c:	8b                   	.byte 0x8b
  3d:	80                   	.byte 0x80
  3e:	98                   	cwtl   
  3f:	0b                   	.byte 0xb

Code starting with the faulting instruction
===========================================
   0:	84 c0                	test   %al,%al
   2:	74 04                	je     0x8
   4:	f3 90                	pause  
   6:	eb f2                	jmp    0xfffffffffffffffa
   8:	65 48 8b 04 25 00 6f 	mov    %gs:0x16f00,%rax
   f:	01 00 
  11:	48                   	rex.W
  12:	8b                   	.byte 0x8b
  13:	80                   	.byte 0x80
  14:	98                   	cwtl   
  15:	0b                   	.byte 0xb
[   71.136243]  ghash_clmulni_intel
[   71.139898] RSP: 0018:ffffc9000b84bdd0 EFLAGS: 00000202
[   71.162013]  isci
[   71.166220]
[   71.172580]  syscopyarea
[   71.175291] RAX: 0000000000000001 RBX: ffff888111af7ca8 RCX: 0000000000000000
[   71.177465]  sysfillrect
[   71.180792] RDX: 0000000000000000 RSI: ffff888f02a177f0 RDI: ffff888f02a177f0
[   71.189291]  rapl
[   71.192595] RBP: ffff888111af7ca8 R08: ffff888f02a177f0 R09: ffffc9000b84bbf0
[   71.201111]  sysimgblt
[   71.203764] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000b84be88
[   71.212307]  libsas
[   71.215453] R13: ffffc9000b84be60 R14: ffff888111af7cd0 R15: 0000000000000001
[   71.223956]  fb_sys_fops
[   71.226799] seq_read_iter (kbuild/src/consumer/fs/seq_file.c:227) 
[   71.235309]  ahci
[   71.238628] proc_reg_read_iter (kbuild/src/consumer/fs/proc/inode.c:311) 
[   71.243337]  intel_cstate
[   71.246008] new_sync_read (kbuild/src/consumer/fs/read_write.c:416 (discriminator 1)) 
[   71.250990]  ipmi_si
[   71.254390] vfs_read (kbuild/src/consumer/fs/read_write.c:496) 
[   71.259089]  scsi_transport_sas
[   71.262024] ksys_read (kbuild/src/consumer/fs/read_write.c:634) 
[   71.266253]  libahci
[   71.270260] do_syscall_64 (kbuild/src/consumer/arch/x86/entry/common.c:46) 
[   71.274349]  mei_me
[   71.277282] entry_SYSCALL_64_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64.S:112) 
[   71.281743]  ipmi_devintf
[   71.284552] RIP: 0033:0x7f3b1fcd5461
[   71.290647]  ioatdma
[ 71.294094] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
All code
========
   0:	fe                   	(bad)  
   1:	ff                   	(bad)  
   2:	ff 50 48             	callq  *0x48(%rax)
   5:	8d 3d fe d0 09 00    	lea    0x9d0fe(%rip),%edi        # 0x9d109
   b:	e8 e9 03 02 00       	callq  0x203f9
  10:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  17:	00 00 
  19:	48 8d 05 99 62 0d 00 	lea    0xd6299(%rip),%rax        # 0xd62b9
  20:	8b 00                	mov    (%rax),%eax
  22:	85 c0                	test   %eax,%eax
  24:	75 13                	jne    0x39
  26:	31 c0                	xor    %eax,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 57                	ja     0x89
  32:	c3                   	retq   
  33:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  39:	41 54                	push   %r12
  3b:	49 89 d4             	mov    %rdx,%r12
  3e:	55                   	push   %rbp
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 57                	ja     0x5f
   8:	c3                   	retq   
   9:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
   f:	41 54                	push   %r12
  11:	49 89 d4             	mov    %rdx,%r12
  14:	55                   	push   %rbp
  15:	48                   	rex.W


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml
        bin/lkp run                    compatible-job.yaml



---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.12.0-rc2-00303-gc928e9b1439d" of type "text/plain" (172899 bytes)

View attachment "job-script" of type "text/plain" (8366 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (27364 bytes)

View attachment "job.yaml" of type "text/plain" (5524 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ