[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b002f2c3-7a5b-591c-8aa1-75b4dbedcf23@huaweicloud.com>
Date: Sat, 24 Jun 2023 17:05:19 +0800
From: Hou Tao <houtao@...weicloud.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>, daniel@...earbox.net,
andrii@...nel.org, void@...ifault.com, paulmck@...nel.org
Cc: tj@...nel.org, rcu@...r.kernel.org, netdev@...r.kernel.org,
bpf@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v2 bpf-next 12/13] bpf: Introduce bpf_mem_free_rcu()
similar to kfree_rcu().
Hi,
On 6/24/2023 11:13 AM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@...nel.org>
>
> Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
> Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
> per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
> objects into free_by_rcu_ttrace list where they are waiting for RCU
> task trace grace period to be freed into slab.
SNIP
> +static void __free_by_rcu(struct rcu_head *head)
> +{
> + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> + struct bpf_mem_cache *tgt = c->tgt;
> + struct llist_node *llnode;
> +
> + if (unlikely(READ_ONCE(c->draining)))
> + goto out;
Because the reading of c->draining and list_add_batch(...,
free_by_rcu_ttrace) is lockless, so checking draining here could not
prevent the leak of objects in c->free_by_rcu_ttrace() as show below
(hope the formatting is OK now). A simple fix is to drain
free_by_rcu_ttrace twice as suggested before. Or checking c->draining
again in __free_by_rcu() when atomic_xchg() returns 1 and calling
free_all(free_by_rcu_ttrace) if draining is true.
P1: bpf_mem_alloc_destroy()
P2: __free_by_rcu()
// got false
P2: read c->draining
P1: c->draining = true
P1: llist_del_all(&c->free_by_rcu_ttrace)
// add to free_by_rcu_ttrace again
P2: llist_add_batch(..., &tgt->free_by_rcu_ttrace)
P2: do_call_rcu_ttrace()
// call_rcu_ttrace_in_progress is 1, so xchg return 1
// and it doesn't being moved to waiting_for_gp_ttrace
P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)
// got 1
P1: atomic_read(&c->call_rcu_ttrace_in_progress)
// objects in free_by_rcu_ttrace is leaked
c->draining also can't guarantee bpf_mem_alloc_destroy() will wait for
the inflight call_rcu_tasks_trace() callback as shown in the following
two cases (these two cases are the same as reported in v1 and I only
reformatted these two diagrams). And I suggest to do
bpf_mem_alloc_destroy as follows:
if (ma->cache) {
rcu_in_progress = 0;
for_each_possible_cpu(cpu) {
c = per_cpu_ptr(ma->cache, cpu);
irq_work_sync(&c->refill_work);
drain_mem_cache(c);
rcu_in_progress +=
atomic_read(&c->call_rcu_in_progress);
}
for_each_possible_cpu(cpu) {
c = per_cpu_ptr(ma->cache, cpu);
rcu_in_progress +=
atomic_read(&c->call_rcu_ttrace_in_progress);
}
Case 1:
P1: bpf_mem_alloc_destroy()
P2: __free_by_rcu()
// got false
P2: c->draining
P1: c->draining = true
// got 0
P1: atomic_read(&c->call_rcu_ttrace_in_progress)
P2: do_call_rcu_ttrace()
// return 0
P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)
P2: call_rcu_tasks_trace()
P2: atomic_set(&c->call_rcu_in_progress, 0)
// also got 0
P1: atomic_read(&c->call_rcu_in_progress)
// won't wait for the inflight __free_rcu_tasks_trace
Case 2:
P1: bpf_mem_alloc_destroy
P2: __free_by_rcu for c1
P2: read c1->draing
P1: c0->draining = true
P1: c1->draining = true
// both of in_progress counter is 0
P1: read c0->call_rcu_in_progress
P1: read c0->call_rcu_ttrace_in_progress
// c1->tgt is c0
// c1->call_rcu_in_progress is 1
// c0->call_rcu_ttrace_in_progress is 0
P2: llist_add_batch(..., c0->free_by_rcu_ttrace)
P2: xchg(c0->call_rcu_ttrace_in_progress, 1)
P2: call_rcu_tasks_trace(c0)
P2: c1->call_rcu_in_progress = 0
// both of in_progress counter is 0
P1: read c1->call_rcu_in_progress
P1: read c1->call_rcu_ttrace_in_progress
// BAD! There is still inflight tasks trace RCU callback
P1: free_mem_alloc_no_barrier()
> +
> + llnode = llist_del_all(&c->waiting_for_gp);
> + if (!llnode)
> + goto out;
> +
> + if (llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace))
> + tgt->free_by_rcu_ttrace_tail = c->waiting_for_gp_tail;
> +
> + /* Objects went through regular RCU GP. Send them to RCU tasks trace */
> + do_call_rcu_ttrace(tgt);
> +out:
> + atomic_set(&c->call_rcu_in_progress, 0);
> +}
> +
Powered by blists - more mailing lists