[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d027cb6b-e32c-36ad-3aba-9a7b1177f89f@meta.com>
Date: Mon, 22 May 2023 21:33:41 -0700
From: Yonghong Song <yhs@...a.com>
To: Teng Qi <starmiku1207184332@...il.com>
Cc: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
andrii@...nel.org, martin.lau@...ux.dev, song@...nel.org,
yhs@...com, kpsingh@...nel.org, sdf@...gle.com, haoluo@...gle.com,
jolsa@...nel.org, davem@...emloft.net, kuba@...nel.org,
hawk@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [bug] kernel: bpf: syscall: a possible sleep-in-atomic bug in
__bpf_prog_put()
On 5/21/23 6:39 AM, Teng Qi wrote:
> Thank you.
>
> > Your above analysis makes sense if indeed that kvfree cannot appear
> > inside a spin lock region or RCU read lock region. But is it true?
> > I checked a few code paths in kvfree/kfree. It is either guarded
> > with local_irq_save/restore or by
> > spin_lock_irqsave/spin_unlock_
> > irqrestore, etc. Did I miss
> > anything? Are you talking about RT kernel here?
>
> To see the sleepable possibility of kvfree, it is important to analyze the
> following calling stack:
> mm/util.c: 645 kvfree()
> mm/vmalloc.c: 2763 vfree()
>
> In kvfree(), to call vfree, if the pointer addr points to memory
> allocated by
> vmalloc(), it calls vfree().
> void kvfree(const void *addr)
> {
> if (is_vmalloc_addr(addr))
> vfree(addr);
> else
> kfree(addr);
> }
>
> In vfree(), in_interrupt() and might_sleep() need to be considered.
> void vfree(const void *addr)
> {
> // ...
> if (unlikely(in_interrupt()))
> {
> vfree_atomic(addr);
> return;
> }
> // ...
> might_sleep();
> // ...
> }
Sorry. I didn't check vfree path. So it does look like that
we need to pay special attention to non interrupt part.
>
> The vfree() may sleep if in_interrupt() == false. The RCU read lock region
> could have in_interrupt() == false and spin lock region which only disables
> preemption also has in_interrupt() == false. So the kvfree() cannot appear
> inside a spin lock region or RCU read lock region if the pointer addr points
> to memory allocated by vmalloc().
>
> > > Therefore, we propose modifying the condition to include
> > > in_atomic(). Could we
> > > update the condition as follows: "in_irq() || irqs_disabled() ||
> > > in_atomic()"?
> > Thank you! We look forward to your feedback.
>
> We now think that ‘irqs_disabled() || in_atomic() ||
> rcu_read_lock_held()’ is
> more proper. irqs_disabled() is for irq flag reg, in_atomic() is for
> preempt count and rcu_read_lock_held() is for RCU read lock region.
We cannot use rcu_read_lock_held() in the 'if' statement. The return
value rcu_read_lock_held() could be 1 for some configuraitons regardless
whether rcu_read_lock() is really held or not. In most cases,
rcu_read_lock_held() is used in issuing potential warnings.
Maybe there are other ways to record whether rcu_read_lock() is held or not?
I agree with your that 'irqs_disabled() || in_atomic()' makes sense
since it covers process context local_irq_save() and spin_lock() cases.
If we cannot resolve rcu_read_lock() presence issue, maybe the condition
can be !in_interrupt(), so any process-context will go to a workqueue.
Alternatively, we could have another solution. We could add another
function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put()
will be done in rcu context. So if in_interrupt(), do kvfree, otherwise,
put into a workqueue.
>
> -- Teng Qi
>
> On Sun, May 21, 2023 at 11:45 AM Yonghong Song <yhs@...a.com
> <mailto:yhs@...a.com>> wrote:
>
>
>
> On 5/19/23 7:18 AM, Teng Qi wrote:
> > Thank you for your response.
> > > Looks like you only have suspicion here. Could you find a real
> violation
> > > here where __bpf_prog_put() is called with !in_irq() &&
> > > !irqs_disabled(), but inside spin_lock or rcu read lock? I
> have not seen
> > > things like that.
> >
> > For the complex conditions to call bpf_prog_put() with 1 refcnt,
> we have
> > been
> > unable to really trigger this atomic violation after trying to
> construct
> > test cases manually. But we found that it is possible to show
> cases with
> > !in_irq() && !irqs_disabled(), but inside spin_lock or rcu read lock.
> > For example, even a failed case, one of selftest cases of bpf,
> netns_cookie,
> > calls bpf_sock_map_update() and may indirectly call bpf_prog_put()
> > only inside rcu read lock: The possible call stack is:
> > net/core/sock_map.c: 615 bpf_sock_map_update()
> > net/core/sock_map.c: 468 sock_map_update_common()
> > net/core/sock_map.c: 217 sock_map_link()
> > kernel/bpf/syscall.c: 2111 bpf_prog_put()
> >
> > The files about netns_cookie include
> > tools/testing/selftests/bpf/progs/netns_cookie_prog.c and
> > tools/testing/selftests/bpf/prog_tests/netns_cookie.c. We
> inserted the
> > following code in
> > ‘net/core/sock_map.c: 468 sock_map_update_common()’:
> > static int sock_map_update_common(..)
> > {
> > int inIrq = in_irq();
> > int irqsDisabled = irqs_disabled();
> > int preemptBits = preempt_count();
> > int inAtomic = in_atomic();
> > int rcuHeld = rcu_read_lock_held();
> > printk("in_irq() %d, irqs_disabled() %d, preempt_count() %d,
> > in_atomic() %d, rcu_read_lock_held() %d", inIrq,
> irqsDisabled,
> > preemptBits, inAtomic, rcuHeld);
> > }
> >
> > The output message is as follows:
> > root@(none):/root/bpf# ./test_progs -t netns_cookie
> > [ 137.639188] in_irq() 0, irqs_disabled() 0, preempt_count() 0,
> > in_atomic() 0,
> > rcu_read_lock_held() 1
> > #113 netns_cookie:OK
> > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> >
> > We notice that there are numerous callers in kernel/, net/ and
> drivers/,
> > so we
> > highly suggest modifying __bpf_prog_put() to address this gap.
> The gap
> > exists
> > because __bpf_prog_put() is only safe under in_irq() ||
> irqs_disabled()
> > but not in_atomic() || rcu_read_lock_held(). The following code
> snippet may
> > mislead developers into thinking that bpf_prog_put() is safe in all
> > contexts.
> > if (in_irq() || irqs_disabled()) {
> > INIT_WORK(&aux->work, bpf_prog_put_deferred);
> > schedule_work(&aux->work);
> > } else {
> > bpf_prog_put_deferred(&aux->work);
> > }
> >
> > Implicit dependency may lead to issues.
> >
> > > Any problem here?
> > We mentioned it to demonstrate the possibility of kvfree() being
> > called by __bpf_prog_put_noref().
> >
> > Thanks.
> > -- Teng Qi
> >
> > On Wed, May 17, 2023 at 1:08 AM Yonghong Song <yhs@...a.com
> <mailto:yhs@...a.com>
> > <mailto:yhs@...a.com <mailto:yhs@...a.com>>> wrote:
> >
> >
> >
> > On 5/16/23 4:18 AM, starmiku1207184332@...il.com
> <mailto:starmiku1207184332@...il.com>
> > <mailto:starmiku1207184332@...il.com
> <mailto:starmiku1207184332@...il.com>> wrote:
> > > From: Teng Qi <starmiku1207184332@...il.com
> <mailto:starmiku1207184332@...il.com>
> > <mailto:starmiku1207184332@...il.com
> <mailto:starmiku1207184332@...il.com>>>
> > >
> > > Hi, bpf developers,
> > >
> > > We are developing a static tool to check the matching between
> > helpers and the
> > > context of hooks. During our analysis, we have discovered some
> > important
> > > findings that we would like to report.
> > >
> > > ‘kernel/bpf/syscall.c: 2097 __bpf_prog_put()’ shows that
> function
> > > bpf_prog_put_deferred() won`t be called in the condition of
> > > ‘in_irq() || irqs_disabled()’.
> > > if (in_irq() || irqs_disabled()) {
> > > INIT_WORK(&aux->work, bpf_prog_put_deferred);
> > > schedule_work(&aux->work);
> > > } else {
> > >
> > > bpf_prog_put_deferred(&aux->work);
> > > }
> > >
> > > We suspect this condition exists because there might be
> sleepable
> > operations
> > > in the callees of the bpf_prog_put_deferred() function:
> > > kernel/bpf/syscall.c: 2097 __bpf_prog_put()
> > > kernel/bpf/syscall.c: 2084 bpf_prog_put_deferred()
> > > kernel/bpf/syscall.c: 2063 __bpf_prog_put_noref()
> > > kvfree(prog->aux->jited_linfo);
> > > kvfree(prog->aux->linfo);
> >
> > Looks like you only have suspicion here. Could you find a real
> > violation
> > here where __bpf_prog_put() is called with !in_irq() &&
> > !irqs_disabled(), but inside spin_lock or rcu read lock? I
> have not seen
> > things like that.
> >
> > >
> > > Additionally, we found that array prog->aux->jited_linfo is
> > initialized in
> > > ‘kernel/bpf/core.c: 157 bpf_prog_alloc_jited_linfo()’:
> > > prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo,
> > > sizeof(*prog->aux->jited_linfo),
> bpf_memcg_flags(GFP_KERNEL |
> > __GFP_NOWARN));
> >
> > Any problem here?
> >
> > >
> > > Our question is whether the condition 'in_irq() ||
> > irqs_disabled() == false' is
> > > sufficient for calling 'kvfree'. We are aware that calling
> > 'kvfree' within the
> > > context of a spin lock or an RCU lock is unsafe.
>
> Your above analysis makes sense if indeed that kvfree cannot appear
> inside a spin lock region or RCU read lock region. But is it true?
> I checked a few code paths in kvfree/kfree. It is either guarded
> with local_irq_save/restore or by
> spin_lock_irqsave/spin_unlock_irqrestore, etc. Did I miss
> anything? Are you talking about RT kernel here?
>
>
> > >
> > > Therefore, we propose modifying the condition to include
> > in_atomic(). Could we
> > > update the condition as follows: "in_irq() ||
> irqs_disabled() ||
> > in_atomic()"?
> > >
> > > Thank you! We look forward to your feedback.
> > >
> > > Signed-off-by: Teng Qi <starmiku1207184332@...il.com
> <mailto:starmiku1207184332@...il.com>
> > <mailto:starmiku1207184332@...il.com
> <mailto:starmiku1207184332@...il.com>>>
> >
>
Powered by blists - more mailing lists