lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQK-ov_rve4pM7McMDQd5E9U5-JPjT5522BaVWDH-NvM5g@mail.gmail.com>
Date: Thu, 7 Sep 2023 08:40:46 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Toke Høiland-Jørgensen <toke@...nel.org>
Cc: Kumar Kartikeya Dwivedi <memxor@...il.com>, Hsin-Wei Hung <hsinweih@....edu>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <kafai@...com>, Song Liu <songliubraving@...com>, 
	Yonghong Song <yhs@...com>, John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Network Development <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: Possible deadlock in bpf queue map

On Thu, Sep 7, 2023 at 6:04 AM Toke Høiland-Jørgensen <toke@...nel.org> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@...il.com> writes:
>
> > On Thu, 7 Sept 2023 at 12:26, Toke Høiland-Jørgensen <toke@...nel.org> wrote:
> >>
> >> +Arnaldo
> >>
> >> > Hi,
> >> >
> >> > Our bpf fuzzer, a customized Syzkaller, triggered a lockdep warning in
> >> > the bpf queue map in v5.15. Since queue_stack_maps.c has no major changes
> >> > since v5.15, we think this should still exist in the latest kernel.
> >> > The bug can be occasionally triggered, and we suspect one of the
> >> > eBPF programs involved to be the following one. We also attached the lockdep
> >> > warning at the end.
> >> >
> >> > #define DEFINE_BPF_MAP_NO_KEY(the_map, TypeOfMap, MapFlags,
> >> > TypeOfValue, MaxEntries) \
> >> >         struct {                                                        \
> >> >             __uint(type, TypeOfMap);                                    \
> >> >             __uint(map_flags, (MapFlags));                              \
> >> >             __uint(max_entries, (MaxEntries));                          \
> >> >             __type(value, TypeOfValue);                                 \
> >> >         } the_map SEC(".maps");
> >> >
> >> > DEFINE_BPF_MAP_NO_KEY(map_0, BPF_MAP_TYPE_QUEUE, 0 | BPF_F_WRONLY,
> >> > struct_0, 162);
> >> > SEC("perf_event")
> >> > int func(struct bpf_perf_event_data *ctx) {
> >> >         char v0[96] = {};
> >> >         uint64_t v1 = 0;
> >> >         v1 = bpf_map_pop_elem(&map_0, v0);
> >> >         return 163819661;
> >> > }
> >> >
> >> >
> >> > The program is attached to the following perf event.
> >> >
> >> > struct perf_event_attr attr_type_hw = {
> >> >         .type = PERF_TYPE_HARDWARE,
> >> >         .config = PERF_COUNT_HW_CPU_CYCLES,
> >> >         .sample_freq = 50,
> >> >         .inherit = 1,
> >> >         .freq = 1,
> >> > };
> >> >
> >> > ================================WARNING: inconsistent lock state
> >> > 5.15.26+ #2 Not tainted
> >> > --------------------------------
> >> > inconsistent {INITIAL USE} -> {IN-NMI} usage.
> >> > syz-executor.5/19749 [HC1[1]:SC0[0]:HE0:SE1] takes:
> >> > ffff88804c9fc198 (&qs->lock){..-.}-{2:2}, at: __queue_map_get+0x31/0x250
> >> > {INITIAL USE} state was registered at:
> >> >   lock_acquire+0x1a3/0x4b0
> >> >   _raw_spin_lock_irqsave+0x48/0x60
> >> >   __queue_map_get+0x31/0x250
> >> >   bpf_prog_577904e86c81dead_func+0x12/0x4b4
> >> >   trace_call_bpf+0x262/0x5d0
> >> >   perf_trace_run_bpf_submit+0x91/0x1c0
> >> >   perf_trace_sched_switch+0x46c/0x700
> >> >   __schedule+0x11b5/0x24a0
> >> >   schedule+0xd4/0x270
> >> >   futex_wait_queue_me+0x25f/0x520
> >> >   futex_wait+0x1e0/0x5f0
> >> >   do_futex+0x395/0x1890
> >> >   __x64_sys_futex+0x1cb/0x480
> >> >   do_syscall_64+0x3b/0xc0
> >> >   entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> > irq event stamp: 13640
> >> > hardirqs last  enabled at (13639): [<ffffffff95eb2bf4>]
> >> > _raw_spin_unlock_irq+0x24/0x40
> >> > hardirqs last disabled at (13640): [<ffffffff95eb2d4d>]
> >> > _raw_spin_lock_irqsave+0x5d/0x60
> >> > softirqs last  enabled at (13464): [<ffffffff93e26de5>] __sys_bpf+0x3e15/0x4e80
> >> > softirqs last disabled at (13462): [<ffffffff93e26da3>] __sys_bpf+0x3dd3/0x4e80
> >> >
> >> > other info that might help us debug this:
> >> >  Possible unsafe locking scenario:
> >> >
> >> >        CPU0
> >> >        ----
> >> >   lock(&qs->lock);
> >> >   <Interrupt>
> >> >     lock(&qs->lock);
> >>
> >> Hmm, so that lock() uses raw_spin_lock_irqsave(), which *should* be
> >> disabling interrupts entirely for the critical section. But I guess a
> >> Perf hardware event can still trigger? Which seems like it would
> >> potentially wreak havoc with lots of things, not just this queue map
> >> function?
> >>
> >> No idea how to protect against this, though. Hoping Arnaldo knows? :)
> >>
> >
> > The locking should probably be protected by a percpu integer counter,
> > incremented and decremented before and after the lock is taken,
> > respectively. If it is already non-zero, then -EBUSY should be
> > returned. It is similar to what htab_lock_bucket protects against in
> > hashtab.c.
>
> Ah, neat! Okay, seems straight-forward enough to replicate. Hsin, could
> you please check if the patch below gets rid of the splat?

Instead of adding all this complexity for the map that is so rarely used
it's easier to disallow it perf_event prog types.
Or add a single if (in_nmi()) return -EBUSY.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ