[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9537276.CDJkKcVGEf@7950hx>
Date: Fri, 14 Nov 2025 15:08:13 +0800
From: Menglong Dong <menglong.dong@...ux.dev>
To: andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
daniel@...earbox.net, eddyz87@...il.com, haoluo@...gle.com,
john.fastabend@...il.com, jolsa@...nel.org, kpsingh@...nel.org,
linux-kernel@...r.kernel.org, martin.lau@...ux.dev, netdev@...r.kernel.org,
sdf@...ichev.me, song@...nel.org, syzkaller-bugs@...glegroups.com,
yonghong.song@...ux.dev,
syzbot <syzbot+18b26edb69b2e19f3b33@...kaller.appspotmail.com>
Subject: Re: [syzbot] [bpf?] possible deadlock in bpf_lru_push_free (2)
On 2025/11/13 12:26, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: e427054ae7bc Merge branch 'x86-fgraph-bpf-fix-orc-stack-un..
> git tree: bpf
> console output: https://syzkaller.appspot.com/x/log.txt?x=136b70b4580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e46b8a1c645465a9
> dashboard link: https://syzkaller.appspot.com/bug?extid=18b26edb69b2e19f3b33
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10013c12580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16541c12580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/c1ac942fc5fb/disk-e427054a.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/be05ef12ba31/vmlinux-e427054a.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/c75604292a15/bzImage-e427054a.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+18b26edb69b2e19f3b33@...kaller.appspotmail.com
>
> ============================================
> WARNING: possible recursive locking detected
> syzkaller #0 Not tainted
> --------------------------------------------
> syz-executor149/10558 is trying to acquire lock:
> ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:514 [inline]
> ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_lru_push_free+0x33b/0xbb0 kernel/bpf/bpf_lru_list.c:553
>
> but task is already holding lock:
> ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:440 [inline]
> ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_lru_pop_free+0x1ab/0x19b0 kernel/bpf/bpf_lru_list.c:496
I were working on this issue by using rqspinlock for LRU map:
https://lore.kernel.org/bpf/20251030030010.95352-1-dongml2@chinatelecom.cn/
However, the lock here is too complex. Take the
htab_lru_map_update_elem for example, it will pop a free node,
updating the hash table, and push the old node to the lru.
The pop and push are both using lock, which means that they
both can fail. For the failure of the pop, we can return the
errno directly. However, what we can do with the failure of
the pushing? In the batch updating, the situation become
much more worse.
Hmm...I have not figure out a good idea, and maybe we can
use some transaction process here. Is there anyone else
that working on this issue?
Thanks!
Menglong Dong
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&loc_l->lock);
> lock(&loc_l->lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 3 locks held by syz-executor149/10558:
> #0: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
> #0: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
> #0: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: bpf_percpu_hash_update+0x2b/0x200 kernel/bpf/hashtab.c:2409
> #1: ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:440 [inline]
> #1: ffffe8ffffc41588 (&loc_l->lock){....}-{2:2}, at: bpf_lru_pop_free+0x1ab/0x19b0 kernel/bpf/bpf_lru_list.c:496
> #2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
> #2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
> #2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2074 [inline]
> #2: ffffffff8df3d620 (rcu_read_lock){....}-{1:3}, at: bpf_trace_run2+0x186/0x4b0 kernel/trace/bpf_trace.c:2116
>
> stack backtrace:
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@...glegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
>
>
Powered by blists - more mailing lists