[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Zph5q+XdNN1bn0Ot@xpf.sh.intel.com>
Date: Thu, 18 Jul 2024 10:10:51 +0800
From: Pengfei Xu <pengfei.xu@...el.com>
To: <namhyung@...nel.org>
CC: <linux-kernel@...r.kernel.org>, <bpf@...r.kernel.org>,
<peterz@...radead.org>
Subject: [Syzkaller & bisect] There is deadlock in __bpf_ringbuf_reserve in
v6.10
Hi Namhyung Kim and bpf expert,
Greetings!
There is deadlock in __bpf_ringbuf_reserve in v6.10
Found the first bad commit:
ee042be16cb4 locking: Apply contention tracepoints in the slow path
All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240717_170536___bpf_ringbuf_reserve
Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/repro.c
Syzkaller repro syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/repro.prog
Syzkaller report: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/repro.report
Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/bisect_info.log
v6.10 bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240717_170536___bpf_ringbuf_reserve/bzImage_0c3836482481200ead7b416ca80c68a29cfdaabd.tar.gz
Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240717_170536___bpf_ringbuf_reserve/0c3836482481200ead7b416ca80c68a29cfdaabd_dmesg.log
"
[ 25.063013]
[ 25.063211] ============================================
[ 25.063694] WARNING: possible recursive locking detected
[ 25.064165] 6.10.0-0c3836482481 #1 Tainted: G W
[ 25.064787] --------------------------------------------
[ 25.065264] repro/745 is trying to acquire lock:
[ 25.065693] ffffc90004f1a0d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[ 25.066517]
[ 25.066517] but task is already holding lock:
[ 25.067054] ffffc900018360d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[ 25.067878]
[ 25.067878] other info that might help us debug this:
[ 25.068504] Possible unsafe locking scenario:
[ 25.068504]
[ 25.069061] CPU0
[ 25.069301] ----
[ 25.069540] lock(&rb->spinlock);
[ 25.069879] lock(&rb->spinlock);
[ 25.070208]
[ 25.070208] *** DEADLOCK ***
[ 25.070208]
[ 25.070741] May be due to missing lock nesting notation
[ 25.070741]
[ 25.071362] 4 locks held by repro/745:
[ 25.071731] #0: ffffffff86fff388 (pcpu_alloc_mutex){+.+.}-{3:3}, at: pcpu_alloc_noprof+0xa07/0x1120
[ 25.072674] #1: ffffffff86e58de0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x1b7/0x5a0
[ 25.073493] #2: ffffc900018360d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x386/0x460
[ 25.074359] #3: ffffffff86e58de0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x1b7/0x5a0
[ 25.075180]
[ 25.075180] stack backtrace:
[ 25.075587] CPU: 0 PID: 745 Comm: repro Tainted: G W 6.10.0-0c3836482481 #1
[ 25.076373] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 25.077661] Call Trace:
[ 25.078033] <TASK>
[ 25.078253] dump_stack_lvl+0xea/0x150
[ 25.078650] dump_stack+0x19/0x20
[ 25.079003] print_deadlock_bug+0x3c0/0x680
[ 25.079417] __lock_acquire+0x2b2a/0x5ca0
[ 25.079829] ? __pfx___lock_acquire+0x10/0x10
[ 25.080270] ? __kasan_check_read+0x15/0x20
[ 25.080693] ? __lock_acquire+0xccf/0x5ca0
[ 25.081101] lock_acquire+0x1ce/0x580
[ 25.081472] ? __bpf_ringbuf_reserve+0x386/0x460
[ 25.081926] ? __pfx_lock_acquire+0x10/0x10
[ 25.082343] ? __kasan_check_read+0x15/0x20
[ 25.082770] _raw_spin_lock_irqsave+0x52/0x80
[ 25.083202] ? __bpf_ringbuf_reserve+0x386/0x460
[ 25.083920] __bpf_ringbuf_reserve+0x386/0x460
[ 25.084487] bpf_ringbuf_reserve+0x63/0xa0
[ 25.084904] bpf_prog_9efe54833449f08e+0x2d/0x47
[ 25.085383] bpf_trace_run2+0x238/0x5a0
[ 25.085784] ? __pfx_bpf_trace_run2+0x10/0x10
[ 25.086237] ? __pfx___bpf_trace_contention_end+0x10/0x10
[ 25.086779] __bpf_trace_contention_end+0xf/0x20
[ 25.087230] __traceiter_contention_end+0x66/0xb0
[ 25.087697] trace_contention_end.constprop.0+0xdc/0x140
[ 25.088207] __pv_queued_spin_lock_slowpath+0x2a1/0xc80
[ 25.088751] ? __pfx___pv_queued_spin_lock_slowpath+0x10/0x10
[ 25.089369] ? __this_cpu_preempt_check+0x21/0x30
[ 25.089833] ? lock_acquire+0x1de/0x580
[ 25.090222] do_raw_spin_lock+0x1fb/0x280
[ 25.090622] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 25.091056] ? debug_smp_processor_id+0x20/0x30
[ 25.091506] ? rcu_is_watching+0x19/0xc0
[ 25.091900] _raw_spin_lock_irqsave+0x5a/0x80
[ 25.092337] ? __bpf_ringbuf_reserve+0x386/0x460
[ 25.092791] __bpf_ringbuf_reserve+0x386/0x460
[ 25.093269] bpf_ringbuf_reserve+0x63/0xa0
[ 25.093694] bpf_prog_9efe54833449f08e+0x2d/0x47
[ 25.094138] bpf_trace_run2+0x238/0x5a0
[ 25.094525] ? __pfx_bpf_trace_run2+0x10/0x10
[ 25.094963] ? lock_acquire+0x1de/0x580
[ 25.095344] ? __pfx_lock_acquire+0x10/0x10
[ 25.095766] ? __pfx___bpf_trace_contention_end+0x10/0x10
[ 25.096296] __bpf_trace_contention_end+0xf/0x20
[ 25.096755] __traceiter_contention_end+0x66/0xb0
[ 25.097245] trace_contention_end+0xc5/0x120
[ 25.097699] __mutex_lock+0x257/0x1660
[ 25.098077] ? pcpu_alloc_noprof+0xa07/0x1120
[ 25.098518] ? __pfx___lock_acquire+0x10/0x10
[ 25.098951] ? _find_first_bit+0x95/0xc0
[ 25.099340] ? __pfx___mutex_lock+0x10/0x10
[ 25.099760] ? __this_cpu_preempt_check+0x21/0x30
[ 25.100223] ? lock_release+0x418/0x840
[ 25.100638] mutex_lock_killable_nested+0x1f/0x30
[ 25.101109] ? mutex_lock_killable_nested+0x1f/0x30
[ 25.101611] pcpu_alloc_noprof+0xa07/0x1120
[ 25.102034] ? lockdep_init_map_type+0x2df/0x810
[ 25.102488] ? __raw_spin_lock_init+0x44/0x120
[ 25.102931] ? __kasan_check_write+0x18/0x20
[ 25.103352] mm_init+0x8da/0xec0
[ 25.103692] copy_mm+0x3cf/0x2550
[ 25.104040] ? __pfx_copy_mm+0x10/0x10
[ 25.104431] ? lockdep_init_map_type+0x2df/0x810
[ 25.104901] ? __raw_spin_lock_init+0x44/0x120
[ 25.105371] copy_process+0x361c/0x6a60
[ 25.105776] ? __pfx_copy_process+0x10/0x10
[ 25.106194] ? __kasan_check_read+0x15/0x20
[ 25.106607] ? __lock_acquire+0x1a02/0x5ca0
[ 25.107033] kernel_clone+0xfd/0x8d0
[ 25.107396] ? __pfx_kernel_wait4+0x10/0x10
[ 25.107811] ? __pfx_kernel_clone+0x10/0x10
[ 25.108214] ? __this_cpu_preempt_check+0x21/0x30
[ 25.108736] ? lock_release+0x418/0x840
[ 25.109144] __do_sys_clone+0xe1/0x120
[ 25.109529] ? __pfx___do_sys_clone+0x10/0x10
[ 25.109999] __x64_sys_clone+0xc7/0x150
[ 25.110375] ? syscall_trace_enter+0x14a/0x230
[ 25.110815] x64_sys_call+0x1e76/0x20d0
[ 25.111188] do_syscall_64+0x6d/0x140
[ 25.111559] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 25.112045] RIP: 0033:0x7f6219f189d7
[ 25.112415] Code: 00 00 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 39 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
[ 25.114082] RSP: 002b:00007fff149665d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[ 25.115078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f6219f189d7
[ 25.115792] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[ 25.116487] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000007194fa985
[ 25.117136] R10: 00007f621a028a10 R11: 0000000000000246 R12: 0000000000000000
[ 25.117793] R13: 0000000000401e31 R14: 0000000000403e08 R15: 00007f621a073000
[ 25.118449] </TASK>
"
Thank you!
---
If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@...alhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@...alhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install
Best Regards,
Thanks!
Powered by blists - more mailing lists