[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df46e0cc-cf0f-76a1-60a0-ef1ef8b8280a@huaweicloud.com>
Date: Thu, 29 Jun 2023 12:31:10 +0800
From: Hou Tao <houtao@...weicloud.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Tejun Heo <tj@...nel.org>, rcu@...r.kernel.org,
Network Development <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Kernel Team <kernel-team@...com>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, David Vernet <void@...ifault.com>,
"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH v3 bpf-next 12/13] bpf: Introduce bpf_mem_free_rcu()
similar to kfree_rcu().
Hi,
On 6/29/2023 11:42 AM, Alexei Starovoitov wrote:
> On Wed, Jun 28, 2023 at 7:24 PM Hou Tao <houtao@...weicloud.com> wrote:
>> Hi,
>>
>> On 6/28/2023 9:56 AM, Alexei Starovoitov wrote:
>>> From: Alexei Starovoitov <ast@...nel.org>
>>>
>>> Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
>>> Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
>>> per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
>>> objects into free_by_rcu_ttrace list where they are waiting for RCU
>>> task trace grace period to be freed into slab.
>>>
>>> The life cycle of objects:
>>> alloc: dequeue free_llist
>>> free: enqeueu free_llist
>>> free_rcu: enqueue free_by_rcu -> waiting_for_gp
>>> free_llist above high watermark -> free_by_rcu_ttrace
>>> after RCU GP waiting_for_gp -> free_by_rcu_ttrace
>>> free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
>> SNIP
>>> +static void __free_by_rcu(struct rcu_head *head)
>>> +{
>>> + struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>> + struct bpf_mem_cache *tgt = c->tgt;
>>> + struct llist_node *llnode;
>>> +
>>> + llnode = llist_del_all(&c->waiting_for_gp);
>>> + if (!llnode)
>>> + goto out;
>>> +
>>> + llist_add_batch(llnode, c->waiting_for_gp_tail, &tgt->free_by_rcu_ttrace);
>>> +
>>> + /* Objects went through regular RCU GP. Send them to RCU tasks trace */
>>> + do_call_rcu_ttrace(tgt);
>> I still got report about leaked free_by_rcu_ttrace without adding any
>> extra hack except using bpf_mem_cache_free_rcu() in htab.
> Please share the steps to repro.
Firstly, I added the attached patch to check whether or not there are
leaked elements when calling free_mem_alloc_no_barrier(), then I ran the
following scripts to do the stress test in a kvm VM with 72 CPUs and
64GB memory:
#!/bin/bash
BASE=/all/code/linux_working/
{
cd $BASE/tools/testing/selftests/bpf
for x in $(seq 2)
do
while true; do ./test_maps &>/dev/null; done &
done
} &
{
cd $BASE/samples/bpf
for y in $(seq 8)
do
while true; do ./map_perf_test &>/dev/null; done &
done
} &
{
cd $BASE/tools/testing/selftests/bpf
for z in $(seq 8)
do
while true
do
for name in overwrite batch_add_batch_del
add_del_on_diff_cpu
do
./bench htab-mem -w1 -d5 -a -p32
--use-case $name
done
done &>/dev/null &
sleep 0.3
done
} &
>
>> When bpf ma is freed through free_mem_alloc(), the following sequence
>> may lead to leak of free_by_rcu_ttrace:
>>
>> P1: bpf_mem_alloc_destroy()
>> P2: __free_by_rcu()
>>
>> // got false
>> P2: read c->draining
>>
>> P1: c->draining = true
>> P1: llist_del_all(&c->free_by_rcu_ttrace)
>>
>> // add to free_by_rcu_ttrace again
>> P2: llist_add_batch(..., &tgt->free_by_rcu_ttrace)
>> P2: do_call_rcu_ttrace()
>> // call_rcu_ttrace_in_progress is 1, so xchg return 1
>> // and it doesn't being moved to waiting_for_gp_ttrace
>> P2: atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)
>>
>> // got 1
>> P1: atomic_read(&c->call_rcu_ttrace_in_progress)
>> // objects in free_by_rcu_ttrace is leaked
>>
>> I think the race could be fixed by checking c->draining in
>> do_call_rcu_ttrace() when atomic_xchg() returns 1 as shown below:
> If the theory of the bug holds true then the fix makes sense,
> but did you repro without fix and cannot repro with the fix?
> We should not add extra code based on a hunch.
Yes. With the fix applied only, the leak didn't occur in the last 13 hours.
>
> .
View attachment "0001-bpf-Check-leaked-objects.patch" of type "text/plain" (2672 bytes)
Powered by blists - more mailing lists