netdev - Re: [PATCH v3 bpf-next 12/13] bpf: Introduce bpf_mem_free_rcu() similar to kfree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQ+V_SiZynZfGMWjSqkKb+8xggvBUmy7oTFUcvGq_2CcLg@mail.gmail.com>
Date: Wed, 28 Jun 2023 20:42:31 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Hou Tao <houtao@...weicloud.com>
Cc: Tejun Heo <tj@...nel.org>, rcu@...r.kernel.org, 
	Network Development <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	Kernel Team <kernel-team@...com>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, David Vernet <void@...ifault.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH v3 bpf-next 12/13] bpf: Introduce bpf_mem_free_rcu()
 similar to kfree_rcu().

On Wed, Jun 28, 2023 at 7:24 PM Hou Tao <houtao@...weicloud.com> wrote:
>
> Hi,
>
> On 6/28/2023 9:56 AM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@...nel.org>
> >
> > Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
> > Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
> > per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
> > objects into free_by_rcu_ttrace list where they are waiting for RCU
> > task trace grace period to be freed into slab.
> >
> > The life cycle of objects:
> > alloc: dequeue free_llist
> > free: enqeueu free_llist
> > free_rcu: enqueue free_by_rcu -> waiting_for_gp
> > free_llist above high watermark -> free_by_rcu_ttrace
> > after RCU GP waiting_for_gp -> free_by_rcu_ttrace
> > free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
> >
> > Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> SNIP
> >
> > +static void __free_by_rcu(struct rcu_head *head)
> > +{
> > +     struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> > +     struct bpf_mem_cache *tgt = c->tgt;
> > +     struct llist_node *llnode;
> > +
> > +     llnode = llist_del_all(&c->waiting_for_gp);
> > +     if (!llnode)
> > +             goto out;
> > +
> > +     llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace);
> > +
> > +     /* Objects went through regular RCU GP. Send them to RCU tasks trace */
> > +     do_call_rcu_ttrace(tgt);
>
> I still got report about leaked free_by_rcu_ttrace without adding any
> extra hack except using bpf_mem_cache_free_rcu() in htab.

Please share the steps to repro.

> When bpf ma is freed through free_mem_alloc(), the following sequence
> may lead to leak of free_by_rcu_ttrace:
>
> P1: bpf_mem_alloc_destroy()
>     P2: __free_by_rcu()
>
>     // got false
>     P2: read c->draining
>
> P1: c->draining = true
> P1: llist_del_all(&c->free_by_rcu_ttrace)
>
>     // add to free_by_rcu_ttrace again
>     P2: llist_add_batch(..., &tgt->free_by_rcu_ttrace)
>         P2: do_call_rcu_ttrace()
>             // call_rcu_ttrace_in_progress is 1, so xchg return 1
>             // and it doesn't being moved to waiting_for_gp_ttrace
>             P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)
>
> // got 1
> P1: atomic_read(&c->call_rcu_ttrace_in_progress)
> // objects in free_by_rcu_ttrace is leaked
>
> I think the race could be fixed by checking c->draining in
> do_call_rcu_ttrace() when atomic_xchg() returns 1 as shown below:

If the theory of the bug holds true then the fix makes sense,
but did you repro without fix and cannot repro with the fix?
We should not add extra code based on a hunch.