[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+pYevQ9QsRB-oLu1ONtzZ31J=3ANqB+aFLLiU4VcGgNA@mail.gmail.com>
Date: Thu, 19 Dec 2024 10:43:13 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Yonghong Song <yonghong.song@...ux.dev>
Cc: Abel Wu <wuyun.abel@...edance.com>, Martin KaFai Lau <martin.lau@...ux.dev>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
David Vernet <void@...ifault.com>,
"open list:BPF [STORAGE & CGROUPS]" <bpf@...r.kernel.org>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage
On Thu, Dec 19, 2024 at 10:39 AM Yonghong Song <yonghong.song@...ux.dev> wrote:
>
>
>
>
> On 12/19/24 4:38 AM, Abel Wu wrote:
> > Hi Yonghong,
> >
> > On 12/19/24 10:45 AM, Yonghong Song Wrote:
> >>
> >>
> >>
> >> On 12/18/24 1:21 AM, Abel Wu wrote:
> >>> The following commit
> >>> bc235cdb423a ("bpf: Prevent deadlock from recursive
> >>> bpf_task_storage_[get|delete]")
> >>> first introduced deadlock prevention for fentry/fexit programs
> >>> attaching
> >>> on bpf_task_storage helpers. That commit also employed the logic in map
> >>> free path in its v6 version.
> >>>
> >>> Later bpf_cgrp_storage was first introduced in
> >>> c4bcfb38a95e ("bpf: Implement cgroup storage available to
> >>> non-cgroup-attached bpf progs")
> >>> which faces the same issue as bpf_task_storage, instead of its busy
> >>> counter, NULL was passed to bpf_local_storage_map_free() which opened
> >>> a window to cause deadlock:
> >>>
> >>> <TASK>
> >>> _raw_spin_lock_irqsave+0x3d/0x50
> >>> bpf_local_storage_update+0xd1/0x460
> >>> bpf_cgrp_storage_get+0x109/0x130
> >>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e
> >>> bpf_trace_run1+0x84/0x100
> >>> cgroup_storage_ptr+0x4c/0x60
> >>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160
> >>> bpf_selem_unlink_storage+0x6f/0x110
> >>> bpf_local_storage_map_free+0xa2/0x110
> >>> bpf_map_free_deferred+0x5b/0x90
> >>> process_one_work+0x17c/0x390
> >>> worker_thread+0x251/0x360
> >>> kthread+0xd2/0x100
> >>> ret_from_fork+0x34/0x50
> >>> ret_from_fork_asm+0x1a/0x30
> >>> </TASK>
> >>>
> >>> [ Since the verifier treats 'void *' as scalar which
> >>> prevents me from getting a pointer to 'struct cgroup *',
> >>> I added a raw tracepoint in cgroup_storage_ptr() to
> >>> help reproducing this issue. ]
> >>>
> >>> Although it is tricky to reproduce, the risk of deadlock exists and
> >>> worthy of a fix, by passing its busy counter to the free procedure so
> >>> it can be properly incremented before storage/smap locking.
> >>
> >> The above stack trace and explanation does not show that we will have
> >> a potential dead lock here. You mentioned that it is tricky to
> >> reproduce,
> >> does it mean that you have done some analysis or coding to reproduce it?
> >> Could you share the details on why you think we may have deadlock here?
> >
> > The stack is A-A deadlocked: cgroup_storage_ptr() is called with
> > storage->lock held, while the bpf_prog attaching on this function
> > also tries to acquire the same lock by calling bpf_cgrp_storage_get()
> > thus leading to a AA deadlock.
> >
> > The tricky part is, instead of attaching on cgroup_storage_ptr()
> > directly, I added a tracepoint inside it to hook:
> >
> > ------
> > diff --git a/kernel/bpf/bpf_cgrp_storage.c
> > b/kernel/bpf/bpf_cgrp_storage.c
> > index 20f05de92e9c..679209d4f88f 100644
> > --- a/kernel/bpf/bpf_cgrp_storage.c
> > +++ b/kernel/bpf/bpf_cgrp_storage.c
> > @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu
> > **cgroup_storage_ptr(void *owner)
> > {
> > struct cgroup *cg = owner;
> >
> > + trace_cgroup_ptr(cg);
> > +
> > return &cg->bpf_cgrp_storage;
> > }
> >
> > ------
> >
> > The reason doing so is that typecasting from 'void *owner' to
> > 'struct cgroup *' will be rejected by the verifier. But there
> > could be other ways to obtain a pointer to the @owner cgroup
> > too, making the deadlock possible.
>
> I checked the callstack and what you described indeed the case.
> In function bpf_selem_unlink_storage(), local_storage->lock is
> held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr.
> If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could
> have a deadlock as you described in the above.
>
> As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr
> does not work due to func signature:
> struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner)
> Even say we support 'void *' for fentry and we do bpf_rdonly_cast()
> to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be
> passed to helper/kfunc.
>
> Your fix looks good but it would be great to have a reproducer.
> One possibility is to find a function which can be fentried within
> local_storage->lock. If you know the cgroup id, in bpf program you
> can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id.
> and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc.
> which should be able to triger deadlock. Could you give a try?
I'd rather mark a set of functions as notrace to avoid this situation
or add:
CFLAGS_REMOVE_bpf_cgrp_storage.o = $(CC_FLAGS_FTRACE)
Powered by blists - more mailing lists