[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALyQVayW7e4FPbaMNNuOmYGYt5pcd47zsx2xVkrekEDaVm7H2g@mail.gmail.com>
Date: Wed, 24 May 2023 20:42:57 +0800
From: Teng Qi <starmiku1207184332@...il.com>
To: Yonghong Song <yhs@...a.com>
Cc: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
andrii@...nel.org, martin.lau@...ux.dev, song@...nel.org, yhs@...com,
kpsingh@...nel.org, sdf@...gle.com, haoluo@...gle.com, jolsa@...nel.org,
davem@...emloft.net, kuba@...nel.org, hawk@...nel.org, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [bug] kernel: bpf: syscall: a possible sleep-in-atomic bug in __bpf_prog_put()
Thank you.
> We cannot use rcu_read_lock_held() in the 'if' statement. The return
> value rcu_read_lock_held() could be 1 for some configurations regardless
> whether rcu_read_lock() is really held or not. In most cases,
> rcu_read_lock_held() is used in issuing potential warnings.
> Maybe there are other ways to record whether rcu_read_lock() is held or not?
Sorry. I was not aware of the dependency of configurations of
rcu_read_lock_held().
> If we cannot resolve rcu_read_lock() presence issue, maybe the condition
> can be !in_interrupt(), so any process-context will go to a workqueue.
I agree that using !in_interrupt() as a condition is an acceptable solution.
> Alternatively, we could have another solution. We could add another
> function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put()
> will be done in rcu context.
Implementing a new function like bpf_prog_put_rcu() is a solution that involves
more significant changes.
> So if in_interrupt(), do kvfree, otherwise,
> put into a workqueue.
Shall we proceed with submitting a patch following this approach?
I would like to mention something unrelated to the possible bug. At this
moment, things seem to be more puzzling. vfree() is safe under in_interrupt()
but not safe under other atomic contexts.
This disorder challenges our conventional belief, a monotonic incrementation
of limitations of the hierarchical atomic contexts, that programer needs
to be more and more careful to write code under rcu read lock, spin lock,
bh disable, interrupt...
This disorder can lead to unexpected consequences, such as code being safe
under interrupts but not safe under spin locks.
The disorder makes kernel programming more complex and may result in more bugs.
Even though we find a way to resolve the possible bug about the bpf_prog_put(),
I feel sad for undermining of kernel`s maintainability and disorder of
hierarchy of atomic contexts.
-- Teng Qi
On Tue, May 23, 2023 at 12:33 PM Yonghong Song <yhs@...a.com> wrote:
>
>
>
> On 5/21/23 6:39 AM, Teng Qi wrote:
> > Thank you.
> >
> > > Your above analysis makes sense if indeed that kvfree cannot appear
> > > inside a spin lock region or RCU read lock region. But is it true?
> > > I checked a few code paths in kvfree/kfree. It is either guarded
> > > with local_irq_save/restore or by
> > > spin_lock_irqsave/spin_unlock_
> > > irqrestore, etc. Did I miss
> > > anything? Are you talking about RT kernel here?
> >
> > To see the sleepable possibility of kvfree, it is important to analyze the
> > following calling stack:
> > mm/util.c: 645 kvfree()
> > mm/vmalloc.c: 2763 vfree()
> >
> > In kvfree(), to call vfree, if the pointer addr points to memory
> > allocated by
> > vmalloc(), it calls vfree().
> > void kvfree(const void *addr)
> > {
> > if (is_vmalloc_addr(addr))
> > vfree(addr);
> > else
> > kfree(addr);
> > }
> >
> > In vfree(), in_interrupt() and might_sleep() need to be considered.
> > void vfree(const void *addr)
> > {
> > // ...
> > if (unlikely(in_interrupt()))
> > {
> > vfree_atomic(addr);
> > return;
> > }
> > // ...
> > might_sleep();
> > // ...
> > }
>
> Sorry. I didn't check vfree path. So it does look like that
> we need to pay special attention to non interrupt part.
>
> >
> > The vfree() may sleep if in_interrupt() == false. The RCU read lock region
> > could have in_interrupt() == false and spin lock region which only disables
> > preemption also has in_interrupt() == false. So the kvfree() cannot appear
> > inside a spin lock region or RCU read lock region if the pointer addr points
> > to memory allocated by vmalloc().
> >
> > > > Therefore, we propose modifying the condition to include
> > > > in_atomic(). Could we
> > > > update the condition as follows: "in_irq() || irqs_disabled() ||
> > > > in_atomic()"?
> > > Thank you! We look forward to your feedback.
> >
> > We now think that ‘irqs_disabled() || in_atomic() ||
> > rcu_read_lock_held()’ is
> > more proper. irqs_disabled() is for irq flag reg, in_atomic() is for
> > preempt count and rcu_read_lock_held() is for RCU read lock region.
>
> We cannot use rcu_read_lock_held() in the 'if' statement. The return
> value rcu_read_lock_held() could be 1 for some configuraitons regardless
> whether rcu_read_lock() is really held or not. In most cases,
> rcu_read_lock_held() is used in issuing potential warnings.
> Maybe there are other ways to record whether rcu_read_lock() is held or not?
>
> I agree with your that 'irqs_disabled() || in_atomic()' makes sense
> since it covers process context local_irq_save() and spin_lock() cases.
>
> If we cannot resolve rcu_read_lock() presence issue, maybe the condition
> can be !in_interrupt(), so any process-context will go to a workqueue.
>
> Alternatively, we could have another solution. We could add another
> function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put()
> will be done in rcu context. So if in_interrupt(), do kvfree, otherwise,
> put into a workqueue.
>
>
> >
> > -- Teng Qi
> >
> > On Sun, May 21, 2023 at 11:45 AM Yonghong Song <yhs@...a.com
> > <mailto:yhs@...a.com>> wrote:
> >
> >
> >
> > On 5/19/23 7:18 AM, Teng Qi wrote:
> > > Thank you for your response.
> > > > Looks like you only have suspicion here. Could you find a real
> > violation
> > > > here where __bpf_prog_put() is called with !in_irq() &&
> > > > !irqs_disabled(), but inside spin_lock or rcu read lock? I
> > have not seen
> > > > things like that.
> > >
> > > For the complex conditions to call bpf_prog_put() with 1 refcnt,
> > we have
> > > been
> > > unable to really trigger this atomic violation after trying to
> > construct
> > > test cases manually. But we found that it is possible to show
> > cases with
> > > !in_irq() && !irqs_disabled(), but inside spin_lock or rcu read lock.
> > > For example, even a failed case, one of selftest cases of bpf,
> > netns_cookie,
> > > calls bpf_sock_map_update() and may indirectly call bpf_prog_put()
> > > only inside rcu read lock: The possible call stack is:
> > > net/core/sock_map.c: 615 bpf_sock_map_update()
> > > net/core/sock_map.c: 468 sock_map_update_common()
> > > net/core/sock_map.c: 217 sock_map_link()
> > > kernel/bpf/syscall.c: 2111 bpf_prog_put()
> > >
> > > The files about netns_cookie include
> > > tools/testing/selftests/bpf/progs/netns_cookie_prog.c and
> > > tools/testing/selftests/bpf/prog_tests/netns_cookie.c. We
> > inserted the
> > > following code in
> > > ‘net/core/sock_map.c: 468 sock_map_update_common()’:
> > > static int sock_map_update_common(..)
> > > {
> > > int inIrq = in_irq();
> > > int irqsDisabled = irqs_disabled();
> > > int preemptBits = preempt_count();
> > > int inAtomic = in_atomic();
> > > int rcuHeld = rcu_read_lock_held();
> > > printk("in_irq() %d, irqs_disabled() %d, preempt_count() %d,
> > > in_atomic() %d, rcu_read_lock_held() %d", inIrq,
> > irqsDisabled,
> > > preemptBits, inAtomic, rcuHeld);
> > > }
> > >
> > > The output message is as follows:
> > > root@(none):/root/bpf# ./test_progs -t netns_cookie
> > > [ 137.639188] in_irq() 0, irqs_disabled() 0, preempt_count() 0,
> > > in_atomic() 0,
> > > rcu_read_lock_held() 1
> > > #113 netns_cookie:OK
> > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > >
> > > We notice that there are numerous callers in kernel/, net/ and
> > drivers/,
> > > so we
> > > highly suggest modifying __bpf_prog_put() to address this gap.
> > The gap
> > > exists
> > > because __bpf_prog_put() is only safe under in_irq() ||
> > irqs_disabled()
> > > but not in_atomic() || rcu_read_lock_held(). The following code
> > snippet may
> > > mislead developers into thinking that bpf_prog_put() is safe in all
> > > contexts.
> > > if (in_irq() || irqs_disabled()) {
> > > INIT_WORK(&aux->work, bpf_prog_put_deferred);
> > > schedule_work(&aux->work);
> > > } else {
> > > bpf_prog_put_deferred(&aux->work);
> > > }
> > >
> > > Implicit dependency may lead to issues.
> > >
> > > > Any problem here?
> > > We mentioned it to demonstrate the possibility of kvfree() being
> > > called by __bpf_prog_put_noref().
> > >
> > > Thanks.
> > > -- Teng Qi
> > >
> > > On Wed, May 17, 2023 at 1:08 AM Yonghong Song <yhs@...a.com
> > <mailto:yhs@...a.com>
> > > <mailto:yhs@...a.com <mailto:yhs@...a.com>>> wrote:
> > >
> > >
> > >
> > > On 5/16/23 4:18 AM, starmiku1207184332@...il.com
> > <mailto:starmiku1207184332@...il.com>
> > > <mailto:starmiku1207184332@...il.com
> > <mailto:starmiku1207184332@...il.com>> wrote:
> > > > From: Teng Qi <starmiku1207184332@...il.com
> > <mailto:starmiku1207184332@...il.com>
> > > <mailto:starmiku1207184332@...il.com
> > <mailto:starmiku1207184332@...il.com>>>
> > > >
> > > > Hi, bpf developers,
> > > >
> > > > We are developing a static tool to check the matching between
> > > helpers and the
> > > > context of hooks. During our analysis, we have discovered some
> > > important
> > > > findings that we would like to report.
> > > >
> > > > ‘kernel/bpf/syscall.c: 2097 __bpf_prog_put()’ shows that
> > function
> > > > bpf_prog_put_deferred() won`t be called in the condition of
> > > > ‘in_irq() || irqs_disabled()’.
> > > > if (in_irq() || irqs_disabled()) {
> > > > INIT_WORK(&aux->work, bpf_prog_put_deferred);
> > > > schedule_work(&aux->work);
> > > > } else {
> > > >
> > > > bpf_prog_put_deferred(&aux->work);
> > > > }
> > > >
> > > > We suspect this condition exists because there might be
> > sleepable
> > > operations
> > > > in the callees of the bpf_prog_put_deferred() function:
> > > > kernel/bpf/syscall.c: 2097 __bpf_prog_put()
> > > > kernel/bpf/syscall.c: 2084 bpf_prog_put_deferred()
> > > > kernel/bpf/syscall.c: 2063 __bpf_prog_put_noref()
> > > > kvfree(prog->aux->jited_linfo);
> > > > kvfree(prog->aux->linfo);
> > >
> > > Looks like you only have suspicion here. Could you find a real
> > > violation
> > > here where __bpf_prog_put() is called with !in_irq() &&
> > > !irqs_disabled(), but inside spin_lock or rcu read lock? I
> > have not seen
> > > things like that.
> > >
> > > >
> > > > Additionally, we found that array prog->aux->jited_linfo is
> > > initialized in
> > > > ‘kernel/bpf/core.c: 157 bpf_prog_alloc_jited_linfo()’:
> > > > prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo,
> > > > sizeof(*prog->aux->jited_linfo),
> > bpf_memcg_flags(GFP_KERNEL |
> > > __GFP_NOWARN));
> > >
> > > Any problem here?
> > >
> > > >
> > > > Our question is whether the condition 'in_irq() ||
> > > irqs_disabled() == false' is
> > > > sufficient for calling 'kvfree'. We are aware that calling
> > > 'kvfree' within the
> > > > context of a spin lock or an RCU lock is unsafe.
> >
> > Your above analysis makes sense if indeed that kvfree cannot appear
> > inside a spin lock region or RCU read lock region. But is it true?
> > I checked a few code paths in kvfree/kfree. It is either guarded
> > with local_irq_save/restore or by
> > spin_lock_irqsave/spin_unlock_irqrestore, etc. Did I miss
> > anything? Are you talking about RT kernel here?
> >
> >
> > > >
> > > > Therefore, we propose modifying the condition to include
> > > in_atomic(). Could we
> > > > update the condition as follows: "in_irq() ||
> > irqs_disabled() ||
> > > in_atomic()"?
> > > >
> > > > Thank you! We look forward to your feedback.
> > > >
> > > > Signed-off-by: Teng Qi <starmiku1207184332@...il.com
> > <mailto:starmiku1207184332@...il.com>
> > > <mailto:starmiku1207184332@...il.com
> > <mailto:starmiku1207184332@...il.com>>>
> > >
> >
Powered by blists - more mailing lists