[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAADnVQKct7yFRwXeQapvQeMd8BbqxsTtyNGG9Ey=s154Dcifnw@mail.gmail.com>
Date: Wed, 22 Mar 2023 15:30:30 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Teng Qi <starmiku1207184332@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
baijiaju1990@...look.com
Subject: Re: [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic
bug in bpf_mmap_unlock_get_irq_work()
On Wed, Mar 22, 2023 at 3:26 AM Teng Qi <starmiku1207184332@...il.com> wrote:
>
> We are returning to this possible bug because it is better not to give up.
> Please don't mind any previous retreats.
>
> The reason we are only fixing the mmap_read_unlock() function is that our goal
> is to develop a static analysis tool to detect bugs in ebpf. According to the
> tool's output, we found that the mmap_read_unlock() function may be called
> indirectly by ebpf hooks in a context where preemption is disabled, which may
> lead to sleepable function calls through this code path.
>
>
> kernel/bpf/mmap_unlock_work.h:52 mmap_read_unlock(mm);
bpf rcu scope is a 1st non sleepable line.
> include/linux/mmap_lock.h:144 up_read(&mm->mmap_lock);
> kernel/locking/rwsem.c:1616 __up_read(sem);
> kernel/locking/rwsem.c:1352 rwsem_wake(sem); <- preempt_disable()
and this is 2nd non sleepable line.
> kernel/locking/rwsem.c:1211 rwsem_mark_wake(sem, RWSEM_WAKE_ANY,
> &wake_q); <- raw_spin_lock_irqsave()
and this is 3rd.
> kernel/locking/rwsem.c:566 wake_q_add_safe(wake_q, tsk);
> kernel/sched/core.c:990 put_task_struct(task);
> include/linux/sched/task.h:119 __put_task_struct(t);
> kernel/fork.c:857 exit_creds(tsk);
> kernel/cred.c:172 put_cred(cred);
> include/linux/cred.h:288 __put_cred(cred);
> kernel/cred.c:151 put_cred_rcu(&cred->rcu);
> kernel/cred.c:121 put_group_info(cred->group_info);
> include/linux/cred.h:53 groups_free(group_info);
> kernel/groups.c:31 kvfree(group_info);
> mm/util.c:647 vfree(addr); <- oops, sleep when size of group_info is large
>
>
> Our focus has been on detecting and fixing bugs in ebpf, and we were not
> previously aware that vfree() might be called in other contexts where preemption
> is disabled.
preemption and non-sleepable are not the same.
> Additionally, you mentioned that rwsem_wake() calls
> preempt_disable(). Upon investigating the code path, we discovered another
> occurrence of raw_spin_lock_irqsave() in rwsem_mark_wake(). We understand that
> our tool does not currently account for context operations from helpers to
> sleepable functions.
>
> To address this limitation, we have decided to enhance our tool's capabilities
> to collect and display information on context operations in the callee functions
> of helpers and potential callers of sleepable functions. However, this work will
> require some time. Consequently, we have decided to abandon this patch before.
>
> At present, we are uncertain about how to fix this potential and theoretical
> bug. One potential solution could be to replace the use of kvfree() with
> kfree_rcu() in groups_free(). Among the callees in put_group_info(),
> groups_free() is the only one that may call sleepable kvfree(). Therefore, we
> propose modifying groups_free() to ensure that put_group_info() does not sleep.
Do not fix what is not broken.
So far you haven't demonstrated that this stack trace is possible.
Powered by blists - more mailing lists