[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202209221401.90061e56-yujie.liu@intel.com>
Date: Thu, 22 Sep 2022 15:27:05 +0800
From: kernel test robot <yujie.liu@...el.com>
To: Josh Triplett <josh@...htriplett.org>
CC: <lkp@...ts.01.org>, <lkp@...el.com>, <linux-mm@...ck.org>,
Eric Biederman <ebiederm@...ssion.com>,
Kees Cook <keescook@...omium.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
<linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: [fs/exec.c] 0a276ae2d2: BUG:workqueue_lockup-pool
Greeting,
FYI, we noticed the following commit (built with gcc-11):
commit: 0a276ae2d224d48c9ddbeb9fa14bc2fd0893a7b9 ("[PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm")
url: https://github.com/intel-lab-lkp/linux/commits/Josh-Triplett/fs-exec-c-Add-fast-path-for-ENOENT-on-PATH-search-before-allocating-mm/20220916-214429
base: https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git for-next/execve
patch link: https://lore.kernel.org/linux-mm/5c7333ea4bec2fad1b47a8fa2db7c31e4ffc4f14.1663334978.git.josh@joshtriplett.org
in testcase: trinity
version: trinity-i386-4d2343bd-1_20200320
with following parameters:
runtime: 300s
group: group-01
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
[ 129.613450][ C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 43s!
[ 129.616497][ C0] Showing busy workqueues and worker pools:
[ 129.620580][ C0] workqueue events: flags=0x0
[ 129.623078][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=4/256 refcnt=5
[ 129.623118][ C0] in-flight: 52:do_free_init do_free_init, 29:key_garbage_collector key_garbage_collector
[ 129.623151][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[ 129.623161][ C0] pending: vmstat_shepherd, drm_fb_helper_damage_work [drm_kms_helper]
[ 129.637767][ C0] workqueue events_highpri: flags=0x10
[ 129.640231][ C0] pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[ 129.640252][ C0] pending: mix_interrupt_randomness
[ 129.640271][ C0] workqueue events_freezable_power_: flags=0x84
[ 129.650217][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 129.650238][ C0] pending: disk_events_workfn
[ 129.650254][ C0] workqueue rcu_gp: flags=0x8
[ 129.659366][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 129.659387][ C0] in-flight: 185:wait_rcu_exp_gp
[ 129.659405][ C0] workqueue mm_percpu_wq: flags=0x8
[ 129.668545][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=4
[ 129.668565][ C0] pending: vmstat_update, lru_add_drain_per_cpu BAR(35)
[ 129.668595][ C0] workqueue kblockd: flags=0x18
[ 129.678480][ C0] pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
[ 129.678499][ C0] pending: blk_mq_timeout_work
[ 129.678560][ C0] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=0s workers=5 idle: 3764 22
[ 175.181511][ T185] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 88583 jiffies s: 41 root: 0x1/.
[ 175.185095][ T185] rcu: blocking rcu_node structures (internal RCU debug):
[ 175.188233][ T185] Task dump for CPU 0:
[ 175.191019][ T185] task:trinity-subchil state:R running task stack: 0 pid: 3731 ppid: 3652 flags:0x2000000c
[ 175.198313][ T185] Call Trace:
[ 175.200246][ T185] <TASK>
[ 175.202827][ T185] ? do_fast_syscall_32 (arch/x86/entry/common.c:203)
[ 175.205560][ T185] ? entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:122)
[ 175.208517][ T185] </TASK>
[ 186.313437][ C0] rcu: INFO: rcu_sched self-detected stall on CPU
[ 186.315637][ C0] rcu: 0-....: (99854 ticks this GP) idle=64dc/1/0x4000000000000000 softirq=25122/25122 fqs=30028
[ 186.317670][ C0] (t=100000 jiffies g=32689 q=29567 ncpus=2)
[ 186.319650][ C0] NMI backtrace for cpu 0
[ 186.321784][ C0] CPU: 0 PID: 3731 Comm: trinity-subchil Not tainted 6.0.0-rc2-00003-g0a276ae2d224 #1
[ 186.324188][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 186.326774][ C0] Call Trace:
[ 186.328778][ C0] <IRQ>
[ 186.331085][ C0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 186.332793][ C0] nmi_cpu_backtrace.cold (lib/nmi_backtrace.c:111)
[ 186.334549][ C0] ? lapic_can_unplug_cpu (arch/x86/kernel/apic/hw_nmi.c:33)
[ 186.336150][ C0] nmi_trigger_cpumask_backtrace (lib/nmi_backtrace.c:62)
[ 186.337777][ C0] trigger_single_cpu_backtrace (include/linux/nmi.h:168)
[ 186.339411][ C0] rcu_dump_cpu_stacks (kernel/rcu/tree_stall.h:371)
[ 186.341082][ C0] print_cpu_stall.cold (kernel/rcu/tree_stall.h:662)
[ 186.342889][ C0] check_cpu_stall (kernel/rcu/tree_stall.h:745)
[ 186.344423][ C0] rcu_sched_clock_irq (kernel/rcu/tree_nocb.h:913 kernel/rcu/tree.c:3663 kernel/rcu/tree.c:2342)
[ 186.346071][ C0] update_process_times (arch/x86/include/asm/preempt.h:27 kernel/time/timer.c:1841)
[ 186.347626][ C0] tick_sched_handle (kernel/time/tick-sched.c:244)
[ 186.349235][ C0] tick_sched_timer (kernel/time/tick-sched.c:1480)
[ 186.350780][ C0] ? tick_sched_do_timer (kernel/time/tick-sched.c:1467)
[ 186.352495][ C0] __hrtimer_run_queues (kernel/time/hrtimer.c:1685 kernel/time/hrtimer.c:1749)
[ 186.354484][ C0] ? enqueue_hrtimer (kernel/time/hrtimer.c:1719)
[ 186.356550][ C0] ? kvm_clock_get_cycles (arch/x86/include/asm/preempt.h:85 arch/x86/kernel/kvmclock.c:80 arch/x86/kernel/kvmclock.c:86)
[ 186.358499][ C0] ? ktime_get_update_offsets_now (kernel/time/timekeeping.c:292 (discriminator 3) kernel/time/timekeeping.c:388 (discriminator 3) kernel/time/timekeeping.c:2320 (discriminator 3))
[ 186.360609][ C0] hrtimer_interrupt (kernel/time/hrtimer.c:1814)
[ 186.362418][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1113)
[ 186.364295][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1106 (discriminator 14))
[ 186.366201][ C0] </IRQ>
[ 186.367490][ C0] <TASK>
[ 186.369287][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649)
[ 186.371191][ C0] RIP: 0010:native_queued_spin_lock_slowpath (arch/x86/include/asm/vdso/processor.h:19 arch/x86/include/asm/qspinlock.h:101 kernel/locking/qspinlock.c:327)
[ 186.373248][ C0] Code: 00 00 00 65 48 2b 04 25 28 00 00 00 0f 85 2f 08 00 00 48 81 c4 88 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc f3 90 <e9> 71 ff ff ff 44 8b 74 24 48 41 81 fe 00 01 00 00 0f 84 e3 00 00
All code
========
0: 00 00 add %al,(%rax)
2: 00 65 48 add %ah,0x48(%rbp)
5: 2b 04 25 28 00 00 00 sub 0x28,%eax
c: 0f 85 2f 08 00 00 jne 0x841
12: 48 81 c4 88 00 00 00 add $0x88,%rsp
19: 5b pop %rbx
1a: 5d pop %rbp
1b: 41 5c pop %r12
1d: 41 5d pop %r13
1f: 41 5e pop %r14
21: 41 5f pop %r15
23: c3 retq
24: cc int3
25: cc int3
26: cc int3
27: cc int3
28: f3 90 pause
2a:* e9 71 ff ff ff jmpq 0xffffffffffffffa0 <-- trapping instruction
2f: 44 8b 74 24 48 mov 0x48(%rsp),%r14d
34: 41 81 fe 00 01 00 00 cmp $0x100,%r14d
3b: 0f .byte 0xf
3c: 84 e3 test %ah,%bl
...
Code starting with the faulting instruction
===========================================
0: e9 71 ff ff ff jmpq 0xffffffffffffff76
5: 44 8b 74 24 48 mov 0x48(%rsp),%r14d
a: 41 81 fe 00 01 00 00 cmp $0x100,%r14d
11: 0f .byte 0xf
12: 84 e3 test %ah,%bl
...
[ 186.377327][ C0] RSP: 0018:ffffc90000e17cb8 EFLAGS: 00000202
[ 186.379142][ C0] RAX: 0000000000000000 RBX: 00000000587993b0 RCX: ffffffff812fc8ec
[ 186.380912][ C0] RDX: fffff520001c2ff7 RSI: 0000000000000004 RDI: ffffc90000e17fb0
[ 186.383091][ C0] RBP: ffffc90000e17fb0 R08: 0000000000000000 R09: ffffc90000e17fb3
[ 186.384750][ C0] R10: fffff520001c2ff6 R11: 0000000000000001 R12: 0000000000000003
[ 186.386354][ C0] R13: fffff520001c2ff6 R14: 0000000000000001 R15: 1ffff920001c2f98
[ 186.388126][ C0] ? native_queued_spin_lock_slowpath (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 arch/x86/include/asm/qspinlock.h:100 kernel/locking/qspinlock.c:327)
[ 186.389979][ C0] ? native_queued_spin_lock_slowpath (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 arch/x86/include/asm/qspinlock.h:100 kernel/locking/qspinlock.c:327)
[ 186.391917][ C0] ? pv_hash (kernel/locking/qspinlock.c:317)
[ 186.393367][ C0] ? lockref_put_return (arch/x86/include/asm/atomic.h:29 (discriminator 5) include/linux/atomic/atomic-instrumented.h:28 (discriminator 5) include/asm-generic/qspinlock.h:73 (discriminator 5) lib/lockref.c:122 (discriminator 5))
[ 186.394784][ C0] _raw_spin_lock (arch/x86/include/asm/paravirt.h:591 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:114 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 186.396052][ C0] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153)
[ 186.397502][ C0] ? allocate_slab (mm/slub.c:378 mm/slub.c:1929 mm/slub.c:1995)
[ 186.399281][ C0] dput (fs/dcache.c:790 fs/dcache.c:900)
[ 186.400885][ C0] path_put (fs/namei.c:559)
[ 186.402250][ C0] do_execveat_common+0x5ec/0x7c0
[ 186.403558][ C0] ? __check_object_size (mm/memremap.c:37)
[ 186.404848][ C0] ? alloc_bprm (fs/exec.c:1866)
[ 186.406039][ C0] __ia32_compat_sys_execve (fs/exec.c:2117)
[ 186.407302][ C0] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178)
[ 186.408512][ C0] do_fast_syscall_32 (arch/x86/entry/common.c:203)
[ 186.409678][ C0] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:122)
[ 186.410947][ C0] RIP: 0023:0xf7f3c549
[ 186.412085][ C0] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
All code
========
0: 03 74 c0 01 add 0x1(%rax,%rax,8),%esi
4: 10 05 03 74 b8 01 adc %al,0x1b87403(%rip) # 0x1b8740d
a: 10 06 adc %al,(%rsi)
c: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi
10: 10 07 adc %al,(%rdi)
12: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
16: 10 08 adc %cl,(%rax)
18: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
1c: 00 00 add %al,(%rax)
1e: 00 00 add %al,(%rax)
20: 00 51 52 add %dl,0x52(%rcx)
23: 55 push %rbp
24: 89 e5 mov %esp,%ebp
26: 0f 34 sysenter
28: cd 80 int $0x80
2a:* 5d pop %rbp <-- trapping instruction
2b: 5a pop %rdx
2c: 59 pop %rcx
2d: c3 retq
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
39: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
Code starting with the faulting instruction
===========================================
0: 5d pop %rbp
1: 5a pop %rdx
2: 59 pop %rcx
3: c3 retq
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
f: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/r/202209221401.90061e56-yujie.liu@intel.com
To reproduce:
# build kernel
cd linux
cp config-6.0.0-rc2-00003-g0a276ae2d224 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-6.0.0-rc2-00003-g0a276ae2d224" of type "text/plain" (167468 bytes)
View attachment "job-script" of type "text/plain" (4742 bytes)
Download attachment "dmesg.xz" of type "application/x-xz" (34124 bytes)
View attachment "trinity" of type "text/plain" (5941 bytes)
Powered by blists - more mailing lists