lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202209221401.90061e56-yujie.liu@intel.com>
Date:   Thu, 22 Sep 2022 15:27:05 +0800
From:   kernel test robot <yujie.liu@...el.com>
To:     Josh Triplett <josh@...htriplett.org>
CC:     <lkp@...ts.01.org>, <lkp@...el.com>, <linux-mm@...ck.org>,
        Eric Biederman <ebiederm@...ssion.com>,
        Kees Cook <keescook@...omium.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        <linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: [fs/exec.c] 0a276ae2d2: BUG:workqueue_lockup-pool

Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 0a276ae2d224d48c9ddbeb9fa14bc2fd0893a7b9 ("[PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm")
url: https://github.com/intel-lab-lkp/linux/commits/Josh-Triplett/fs-exec-c-Add-fast-path-for-ENOENT-on-PATH-search-before-allocating-mm/20220916-214429
base: https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git for-next/execve
patch link: https://lore.kernel.org/linux-mm/5c7333ea4bec2fad1b47a8fa2db7c31e4ffc4f14.1663334978.git.josh@joshtriplett.org

in testcase: trinity
version: trinity-i386-4d2343bd-1_20200320
with following parameters:

	runtime: 300s
	group: group-01

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


[  129.613450][    C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 43s!
[  129.616497][    C0] Showing busy workqueues and worker pools:
[  129.620580][    C0] workqueue events: flags=0x0
[  129.623078][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=4/256 refcnt=5
[  129.623118][    C0]     in-flight: 52:do_free_init do_free_init, 29:key_garbage_collector key_garbage_collector
[  129.623151][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[  129.623161][    C0]     pending: vmstat_shepherd, drm_fb_helper_damage_work [drm_kms_helper]
[  129.637767][    C0] workqueue events_highpri: flags=0x10
[  129.640231][    C0]   pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[  129.640252][    C0]     pending: mix_interrupt_randomness
[  129.640271][    C0] workqueue events_freezable_power_: flags=0x84
[  129.650217][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  129.650238][    C0]     pending: disk_events_workfn
[  129.650254][    C0] workqueue rcu_gp: flags=0x8
[  129.659366][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  129.659387][    C0]     in-flight: 185:wait_rcu_exp_gp
[  129.659405][    C0] workqueue mm_percpu_wq: flags=0x8
[  129.668545][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=4
[  129.668565][    C0]     pending: vmstat_update, lru_add_drain_per_cpu BAR(35)
[  129.668595][    C0] workqueue kblockd: flags=0x18
[  129.678480][    C0]   pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[  129.678499][    C0]     pending: blk_mq_timeout_work
[  129.678560][    C0] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=0s workers=5 idle: 3764 22
[  175.181511][  T185] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 88583 jiffies s: 41 root: 0x1/.
[  175.185095][  T185] rcu: blocking rcu_node structures (internal RCU debug):
[  175.188233][  T185] Task dump for CPU 0:
[  175.191019][  T185] task:trinity-subchil state:R  running task     stack:    0 pid: 3731 ppid:  3652 flags:0x2000000c
[  175.198313][  T185] Call Trace:
[  175.200246][  T185]  <TASK>
[ 175.202827][ T185] ? do_fast_syscall_32 (arch/x86/entry/common.c:203) 
[ 175.205560][ T185] ? entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:122) 
[  175.208517][  T185]  </TASK>
[  186.313437][    C0] rcu: INFO: rcu_sched self-detected stall on CPU
[  186.315637][    C0] rcu: 	0-....: (99854 ticks this GP) idle=64dc/1/0x4000000000000000 softirq=25122/25122 fqs=30028
[  186.317670][    C0] 	(t=100000 jiffies g=32689 q=29567 ncpus=2)
[  186.319650][    C0] NMI backtrace for cpu 0
[  186.321784][    C0] CPU: 0 PID: 3731 Comm: trinity-subchil Not tainted 6.0.0-rc2-00003-g0a276ae2d224 #1
[  186.324188][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[  186.326774][    C0] Call Trace:
[  186.328778][    C0]  <IRQ>
[ 186.331085][ C0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) 
[ 186.332793][ C0] nmi_cpu_backtrace.cold (lib/nmi_backtrace.c:111) 
[ 186.334549][ C0] ? lapic_can_unplug_cpu (arch/x86/kernel/apic/hw_nmi.c:33) 
[ 186.336150][ C0] nmi_trigger_cpumask_backtrace (lib/nmi_backtrace.c:62) 
[ 186.337777][ C0] trigger_single_cpu_backtrace (include/linux/nmi.h:168) 
[ 186.339411][ C0] rcu_dump_cpu_stacks (kernel/rcu/tree_stall.h:371) 
[ 186.341082][ C0] print_cpu_stall.cold (kernel/rcu/tree_stall.h:662) 
[ 186.342889][ C0] check_cpu_stall (kernel/rcu/tree_stall.h:745) 
[ 186.344423][ C0] rcu_sched_clock_irq (kernel/rcu/tree_nocb.h:913 kernel/rcu/tree.c:3663 kernel/rcu/tree.c:2342) 
[ 186.346071][ C0] update_process_times (arch/x86/include/asm/preempt.h:27 kernel/time/timer.c:1841) 
[ 186.347626][ C0] tick_sched_handle (kernel/time/tick-sched.c:244) 
[ 186.349235][ C0] tick_sched_timer (kernel/time/tick-sched.c:1480) 
[ 186.350780][ C0] ? tick_sched_do_timer (kernel/time/tick-sched.c:1467) 
[ 186.352495][ C0] __hrtimer_run_queues (kernel/time/hrtimer.c:1685 kernel/time/hrtimer.c:1749) 
[ 186.354484][ C0] ? enqueue_hrtimer (kernel/time/hrtimer.c:1719) 
[ 186.356550][ C0] ? kvm_clock_get_cycles (arch/x86/include/asm/preempt.h:85 arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86) 
[ 186.358499][ C0] ? ktime_get_update_offsets_now (kernel/time/timekeeping.c:292 (discriminator 3) kernel/time/timekeeping.c:388 (discriminator 3) kernel/time/timekeeping.c:2320 (discriminator 3)) 
[ 186.360609][ C0] hrtimer_interrupt (kernel/time/hrtimer.c:1814) 
[ 186.362418][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1113) 
[ 186.364295][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1106 (discriminator 14)) 
[  186.366201][    C0]  </IRQ>
[  186.367490][    C0]  <TASK>
[ 186.369287][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649) 
[ 186.371191][ C0] RIP: 0010:native_queued_spin_lock_slowpath (arch/x86/include/asm/vdso/processor.h:19 arch/x86/include/asm/qspinlock.h:101 kernel/locking/qspinlock.c:327) 
[ 186.373248][ C0] Code: 00 00 00 65 48 2b 04 25 28 00 00 00 0f 85 2f 08 00 00 48 81 c4 88 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc f3 90 <e9> 71 ff ff ff 44 8b 74 24 48 41 81 fe 00 01 00 00 0f 84 e3 00 00
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 65 48             	add    %ah,0x48(%rbp)
   5:	2b 04 25 28 00 00 00 	sub    0x28,%eax
   c:	0f 85 2f 08 00 00    	jne    0x841
  12:	48 81 c4 88 00 00 00 	add    $0x88,%rsp
  19:	5b                   	pop    %rbx
  1a:	5d                   	pop    %rbp
  1b:	41 5c                	pop    %r12
  1d:	41 5d                	pop    %r13
  1f:	41 5e                	pop    %r14
  21:	41 5f                	pop    %r15
  23:	c3                   	retq   
  24:	cc                   	int3   
  25:	cc                   	int3   
  26:	cc                   	int3   
  27:	cc                   	int3   
  28:	f3 90                	pause  
  2a:*	e9 71 ff ff ff       	jmpq   0xffffffffffffffa0		<-- trapping instruction
  2f:	44 8b 74 24 48       	mov    0x48(%rsp),%r14d
  34:	41 81 fe 00 01 00 00 	cmp    $0x100,%r14d
  3b:	0f                   	.byte 0xf
  3c:	84 e3                	test   %ah,%bl
	...

Code starting with the faulting instruction
===========================================
   0:	e9 71 ff ff ff       	jmpq   0xffffffffffffff76
   5:	44 8b 74 24 48       	mov    0x48(%rsp),%r14d
   a:	41 81 fe 00 01 00 00 	cmp    $0x100,%r14d
  11:	0f                   	.byte 0xf
  12:	84 e3                	test   %ah,%bl
	...
[  186.377327][    C0] RSP: 0018:ffffc90000e17cb8 EFLAGS: 00000202
[  186.379142][    C0] RAX: 0000000000000000 RBX: 00000000587993b0 RCX: ffffffff812fc8ec
[  186.380912][    C0] RDX: fffff520001c2ff7 RSI: 0000000000000004 RDI: ffffc90000e17fb0
[  186.383091][    C0] RBP: ffffc90000e17fb0 R08: 0000000000000000 R09: ffffc90000e17fb3
[  186.384750][    C0] R10: fffff520001c2ff6 R11: 0000000000000001 R12: 0000000000000003
[  186.386354][    C0] R13: fffff520001c2ff6 R14: 0000000000000001 R15: 1ffff920001c2f98
[ 186.388126][ C0] ? native_queued_spin_lock_slowpath (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 arch/x86/include/asm/qspinlock.h:100 kernel/locking/qspinlock.c:327) 
[ 186.389979][ C0] ? native_queued_spin_lock_slowpath (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 arch/x86/include/asm/qspinlock.h:100 kernel/locking/qspinlock.c:327) 
[ 186.391917][ C0] ? pv_hash (kernel/locking/qspinlock.c:317) 
[ 186.393367][ C0] ? lockref_put_return (arch/x86/include/asm/atomic.h:29 (discriminator 5) include/linux/atomic/atomic-instrumented.h:28 (discriminator 5) include/asm-generic/qspinlock.h:73 (discriminator 5) lib/lockref.c:122 (discriminator 5)) 
[ 186.394784][ C0] _raw_spin_lock (arch/x86/include/asm/paravirt.h:591 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:114 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) 
[ 186.396052][ C0] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153) 
[ 186.397502][ C0] ? allocate_slab (mm/slub.c:378 mm/slub.c:1929 mm/slub.c:1995) 
[ 186.399281][ C0] dput (fs/dcache.c:790 fs/dcache.c:900) 
[ 186.400885][ C0] path_put (fs/namei.c:559) 
[ 186.402250][ C0] do_execveat_common+0x5ec/0x7c0 
[ 186.403558][ C0] ? __check_object_size (mm/memremap.c:37) 
[ 186.404848][ C0] ? alloc_bprm (fs/exec.c:1866) 
[ 186.406039][ C0] __ia32_compat_sys_execve (fs/exec.c:2117) 
[ 186.407302][ C0] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178) 
[ 186.408512][ C0] do_fast_syscall_32 (arch/x86/entry/common.c:203) 
[ 186.409678][ C0] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:122) 
[  186.410947][    C0] RIP: 0023:0xf7f3c549
[ 186.412085][ C0] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
All code
========
   0:	03 74 c0 01          	add    0x1(%rax,%rax,8),%esi
   4:	10 05 03 74 b8 01    	adc    %al,0x1b87403(%rip)        # 0x1b8740d
   a:	10 06                	adc    %al,(%rsi)
   c:	03 74 b4 01          	add    0x1(%rsp,%rsi,4),%esi
  10:	10 07                	adc    %al,(%rdi)
  12:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
  16:	10 08                	adc    %cl,(%rax)
  18:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
  1c:	00 00                	add    %al,(%rax)
  1e:	00 00                	add    %al,(%rax)
  20:	00 51 52             	add    %dl,0x52(%rcx)
  23:	55                   	push   %rbp
  24:	89 e5                	mov    %esp,%ebp
  26:	0f 34                	sysenter 
  28:	cd 80                	int    $0x80
  2a:*	5d                   	pop    %rbp		<-- trapping instruction
  2b:	5a                   	pop    %rdx
  2c:	59                   	pop    %rcx
  2d:	c3                   	retq   
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
  39:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi

Code starting with the faulting instruction
===========================================
   0:	5d                   	pop    %rbp
   1:	5a                   	pop    %rdx
   2:	59                   	pop    %rcx
   3:	c3                   	retq   
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
   f:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/r/202209221401.90061e56-yujie.liu@intel.com


To reproduce:

        # build kernel
	cd linux
	cp config-6.0.0-rc2-00003-g0a276ae2d224 .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.


-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

View attachment "config-6.0.0-rc2-00003-g0a276ae2d224" of type "text/plain" (167468 bytes)

View attachment "job-script" of type "text/plain" (4742 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (34124 bytes)

View attachment "trinity" of type "text/plain" (5941 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ