[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2bf11b56-7494-c0a9-09d4-c9e41aaba850@meta.com>
Date: Tue, 27 Jun 2023 17:59:54 -0700
From: Alexei Starovoitov <ast@...a.com>
To: Hou Tao <houtao@...weicloud.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, David Vernet <void@...ifault.com>,
"Paul E. McKenney" <paulmck@...nel.org>, Tejun Heo <tj@...nel.org>,
rcu@...r.kernel.org, Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v2 bpf-next 09/13] bpf: Allow reuse from
waiting_for_gp_ttrace list.
On 6/26/23 12:16 AM, Hou Tao wrote:
> Hi,
>
> On 6/26/2023 12:42 PM, Alexei Starovoitov wrote:
>> On Sun, Jun 25, 2023 at 8:30 PM Hou Tao <houtao@...weicloud.com> wrote:
>>> Hi,
>>>
>>> On 6/24/2023 11:13 AM, Alexei Starovoitov wrote:
>>>> From: Alexei Starovoitov <ast@...nel.org>
>>>>
>>>> alloc_bulk() can reuse elements from free_by_rcu_ttrace.
>>>> Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc().
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
>>>> ---
>>>> kernel/bpf/memalloc.c | 9 +++++++++
>>>> 1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>>>> index 692a9a30c1dc..666917c16e87 100644
>>>> --- a/kernel/bpf/memalloc.c
>>>> +++ b/kernel/bpf/memalloc.c
>>>> @@ -203,6 +203,15 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
>>>> if (i >= cnt)
>>>> return;
>>>>
>>>> + for (; i < cnt; i++) {
>>>> + obj = llist_del_first(&c->waiting_for_gp_ttrace);
>>> After allowing to reuse elements from waiting_for_gp_ttrace, there may
>>> be concurrent llist_del_first() and llist_del_all() as shown below and
>>> llist_del_first() is not safe because the elements freed from free_rcu()
>>> could be reused immediately and head->first may be added back to
>>> c0->waiting_for_gp_ttrace by other process.
>>>
>>> // c0
>>> alloc_bulk()
>>> llist_del_first(&c->waiting_for_gp_ttrace)
>>>
>>> // c1->tgt = c0
>>> free_rcu()
>>> llist_del_all(&c->waiting_for_gp_ttrace)
>> I'm still thinking about how to fix the other issues you've reported,
>> but this one, I believe, is fine.
>> Are you basing 'not safe' on a comment?
>> Why xchg(&head->first, NULL); on one cpu and
>> try_cmpxchg(&head->first, &entry, next);
>> is unsafe?
> No, I didn't reason it only based on the comment in llist.h. The reason
> I though it was not safe because llist_del_first() may have ABA problem
> as pointed by you early, because after __free_rcu(), the elements on
> waiting_for_gp_ttrace would be available immediately and
> waiting_for_gp_ttrace->first may be reused and then added back to
> waiting_for_gp_ttrace->first again as shown below.
>
> // c0->waiting_for_gp_ttrace
> A -> B -> C -> nil
>
> P1:
> alloc_bulk(c0)
> llist_del_first(&c0->waiting_for_gp_ttrace)
> entry = A
> next = B
>
> P2: __free_rcu
> // A/B/C is freed back to slab
> // c0->waiting_for_gp_ttrace->first = NULL
> free_all(llist_del_all(c0))
>
> P3: alloc_bulk(c1)
> // got A
> __kmalloc()
>
> // free A (from c1), ..., last free X (allocated from c0)
> P3: unit_free(c1)
> // the last freed element X is from c0
> c1->tgt = c0
> c1->free_llist->first -> X -> Y -> ... -> A
> P3: free_bulk(c1)
> enque_to_free(c0)
> c0->free_by_rcu_ttrace->first -> A -> ... -> Y -> X
> __llist_add_batch(c0->waiting_for_gp_ttrace)
> c0->waiting_for_gp_ttrace = A -> ... -> Y -> X
In theory that's possible, but for this to happen one cpu needs
to be thousand times slower than all others and since there is no
preemption in llist_del_first I don't think we need to worry about it.
Also with removal of _tail optimization the above
llist_add_batch(waiting_for_gp_ttrace)
will become a loop, so reused element will be at the very end
instead of top, so one cpu to million times slower which is not realistic.
> P1:
> // A is added back as first again
> // but llist_del_first() didn't know
> try_cmpxhg(&c0->waiting_for_gp_ttrace->first, A, B)
> // c0->waiting_for_gp_trrace is corrupted
> c0->waiting_for_gp_ttrace->first = B
>
Powered by blists - more mailing lists