netdev - Re: [syzbot] [bpf?] possible deadlock in __bpf_ringbuf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 13 Mar 2024 10:04:11 +0100
From: Jiri Olsa <olsajiri@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Jiri Olsa <olsajiri@...il.com>,
	syzbot <syzbot+850aaf14624dc0c6d366@...kaller.appspotmail.com>,
	andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
	daniel@...earbox.net, haoluo@...gle.com, john.fastabend@...il.com,
	kpsingh@...nel.org, linux-kernel@...r.kernel.org,
	martin.lau@...ux.dev, netdev@...r.kernel.org, sdf@...gle.com,
	song@...nel.org, syzkaller-bugs@...glegroups.com,
	yonghong.song@...ux.dev
Subject: Re: [syzbot] [bpf?] possible deadlock in __bpf_ringbuf_reserve

On Tue, Mar 12, 2024 at 03:37:16PM -0700, Andrii Nakryiko wrote:
> On Tue, Mar 12, 2024 at 2:18 PM Jiri Olsa <olsajiri@...il.com> wrote:
> >
> > On Tue, Mar 12, 2024 at 10:02:27PM +0100, Jiri Olsa wrote:
> > > On Tue, Mar 12, 2024 at 09:41:26AM -0700, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:    df4793505abd Merge tag 'net-6.8-rc8' of git://git.kernel.o..
> > > > git tree:       bpf
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=11fd0092180000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=c11c5c676adb61f0
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=850aaf14624dc0c6d366
> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1509c4ae180000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10babc01180000
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/d2e80ee1112b/disk-df479350.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/b35ea54cd190/vmlinux-df479350.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/59f69d999ad2/bzImage-df479350.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+850aaf14624dc0c6d366@...kaller.appspotmail.com
> > > >
> > > > ============================================
> > > > WARNING: possible recursive locking detected
> > > > 6.8.0-rc7-syzkaller-gdf4793505abd #0 Not tainted
> > > > --------------------------------------------
> > > > strace-static-x/5063 is trying to acquire lock:
> > > > ffffc900096f10d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x211/0x4f0 kernel/bpf/ringbuf.c:424
> > > >
> > > > but task is already holding lock:
> > > > ffffc900098410d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x211/0x4f0 kernel/bpf/ringbuf.c:424
> > > >
> > > > other info that might help us debug this:
> > > >  Possible unsafe locking scenario:
> > > >
> > > >        CPU0
> > > >        ----
> > > >   lock(&rb->spinlock);
> > > >   lock(&rb->spinlock);
> > > >
> > > >  *** DEADLOCK ***
> > > >
> > > >  May be due to missing lock nesting notation
> > > >
> > > > 4 locks held by strace-static-x/5063:
> > > >  #0: ffff88807857e068 (&pipe->mutex/1){+.+.}-{3:3}, at: __pipe_lock fs/pipe.c:103 [inline]
> > > >  #0: ffff88807857e068 (&pipe->mutex/1){+.+.}-{3:3}, at: pipe_write+0x1cc/0x1a40 fs/pipe.c:465
> > > >  #1: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
> > > >  #1: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
> > > >  #1: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2380 [inline]
> > > >  #1: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x114/0x420 kernel/trace/bpf_trace.c:2420
> > > >  #2: ffffc900098410d8 (&rb->spinlock){-.-.}-{2:2}, at: __bpf_ringbuf_reserve+0x211/0x4f0 kernel/bpf/ringbuf.c:424
> > > >  #3: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
> > > >  #3: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
> > > >  #3: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2380 [inline]
> > > >  #3: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x114/0x420 kernel/trace/bpf_trace.c:2420
> > > >
> > > > stack backtrace:
> > > > CPU: 0 PID: 5063 Comm: strace-static-x Not tainted 6.8.0-rc7-syzkaller-gdf4793505abd #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
> > > > Call Trace:
> > > >  <TASK>
> > > >  __dump_stack lib/dump_stack.c:88 [inline]
> > > >  dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
> > > >  check_deadlock kernel/locking/lockdep.c:3062 [inline]
> > > >  validate_chain+0x15c0/0x58e0 kernel/locking/lockdep.c:3856
> > > >  __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
> > > >  lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
> > > >  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> > > >  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
> > > >  __bpf_ringbuf_reserve+0x211/0x4f0 kernel/bpf/ringbuf.c:424
> > > >  ____bpf_ringbuf_reserve kernel/bpf/ringbuf.c:459 [inline]
> > > >  bpf_ringbuf_reserve+0x5c/0x70 kernel/bpf/ringbuf.c:451
> > > >  bpf_prog_9efe54833449f08e+0x2d/0x47
> > > >  bpf_dispatcher_nop_func include/linux/bpf.h:1231 [inline]
> > > >  __bpf_prog_run include/linux/filter.h:651 [inline]
> > > >  bpf_prog_run include/linux/filter.h:658 [inline]
> > > >  __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
> > >
> > > hum, scratching my head how this could passed through the prog->active check,
> >
> > nah could be 2 instances of the same program, got confused by the tag
> >
> > trace_contention_end
> >   __bpf_trace_run(prog1)
> >     bpf_prog_9efe54833449f08e
> >       bpf_ringbuf_reserve
> >         trace_contention_end
> >           __bpf_trace_run(prog1)  prog1->active check fails
> >           __bpf_trace_run(prog2)
> >             bpf_prog_9efe54833449f08e
> >               bpf_ringbuf_reserve
> >                 lockup
> >
> > we had similar issue in [1] and we replaced the lock with extra buffers,
> > not sure that's possible in bpf_ringbuf_reserve
> >
> 
> Having trace_contention_begin and trace_contention_end in such
> low-level parts of ringbuf (and I'm sure anything in BPF that's using
> spinlock) is unfortunate. I'm not sure what's the best solution, but
> it would be great if we had ability to disable these tracepoints when
> taking lock in low-level BPF infrastructure. Given BPF programs can
> attach to these tracepoints, it's best to avoid this arbitrary nesting
> of BPF ringbuf calls. Also note, no per-program protection will help,
> because it can be independent BPF programs using the same map.

one of the initial attempts for the previous problem was to deny
the attach of programs calling printk to printk tracepoint:
  https://lore.kernel.org/bpf/20221121213123.1373229-1-jolsa@kernel.org/

how about we overload the bpf contention tracepoints callbacks and
make it conditional like outlined below.. but not sure it'd be
feasible on the lock/unlock calling sides to use this

jirka


---
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0a5c4efc73c3..c17b7eaab440 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2347,13 +2347,42 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
 extern struct bpf_raw_event_map __start__bpf_raw_tp[];
 extern struct bpf_raw_event_map __stop__bpf_raw_tp[];
 
+extern struct tracepoint __tracepoint_contention_begin;
+
+#define __CAST_TO_U64(x) ({ \
+	typeof(x) __src = (x); \
+	UINTTYPE(sizeof(x)) __dst; \
+	memcpy(&__dst, &__src, sizeof(__dst)); \
+	(u64)__dst; })
+
+int contention_tps_disable;
+
+static notrace void
+__bpf_trace_contention_begin_overload(void *__data, void *lock, unsigned int flags)
+{
+	struct bpf_prog *prog = __data;
+
+	if (contention_tps_disable)
+		return;
+
+	bpf_trace_run2(prog, __CAST_TO_U64(lock), __CAST_TO_U64(flags));
+}
+
+static struct bpf_raw_event_map* fixup(struct bpf_raw_event_map *btp)
+{
+	if (btp->tp == &__tracepoint_contention_begin)
+		btp->bpf_func = __bpf_trace_contention_begin_overload;
+
+	return btp;
+}
+
 struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
 {
 	struct bpf_raw_event_map *btp = __start__bpf_raw_tp;
 
 	for (; btp < __stop__bpf_raw_tp; btp++) {
 		if (!strcmp(btp->tp->name, name))
-			return btp;
+			return fixup(btp);
 	}
 
 	return bpf_get_raw_tracepoint_module(name);