netdev - Re: [PATCH v2 bpf-next 09/13] bpf: Allow reuse from waiting_for_gp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <957dd5cd-0855-1197-7045-4cb1590bd753@huaweicloud.com>
Date:   Wed, 28 Jun 2023 16:09:14 +0800
From:   Hou Tao <houtao@...weicloud.com>
To:     Alexei Starovoitov <ast@...a.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        David Vernet <void@...ifault.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, Tejun Heo <tj@...nel.org>,
        rcu@...r.kernel.org, Network Development <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v2 bpf-next 09/13] bpf: Allow reuse from
 waiting_for_gp_ttrace list.

Hi,

On 6/28/2023 8:59 AM, Alexei Starovoitov wrote:
> On 6/26/23 12:16 AM, Hou Tao wrote:
>> Hi,
>>
>> On 6/26/2023 12:42 PM, Alexei Starovoitov wrote:
>>> On Sun, Jun 25, 2023 at 8:30 PM Hou Tao <houtao@...weicloud.com> wrote:
>>>> Hi,
>>>>
>>>> On 6/24/2023 11:13 AM, Alexei Starovoitov wrote:
>>>>> From: Alexei Starovoitov <ast@...nel.org>
>>>>>
>>>>> alloc_bulk() can reuse elements from free_by_rcu_ttrace.
>>>>> Let it reuse from waiting_for_gp_ttrace as well to avoid
>>>>> unnecessary kmalloc().
>>>>>
>>>>> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
>>>>> ---
>>>>>   kernel/bpf/memalloc.c | 9 +++++++++
>>>>>   1 file changed, 9 insertions(+)
>>>>>
SNIP
>>        // free A (from c1), ..., last free X (allocated from c0)
>>      P3: unit_free(c1)
>>          // the last freed element X is from c0
>>          c1->tgt = c0
>>          c1->free_llist->first -> X -> Y -> ... -> A
>>      P3: free_bulk(c1)
>>          enque_to_free(c0)
>>              c0->free_by_rcu_ttrace->first -> A -> ... -> Y -> X
>>          __llist_add_batch(c0->waiting_for_gp_ttrace)
>>              c0->waiting_for_gp_ttrace = A -> ... -> Y -> X
>
> In theory that's possible, but for this to happen one cpu needs
> to be thousand times slower than all others and since there is no
> preemption in llist_del_first I don't think we need to worry about it.

Not sure whether or not such case will be possible in a VM, after all,
the CPU X is just a thread in host and it may be preempted in any time
and with any duration.
> Also with removal of _tail optimization the above
> llist_add_batch(waiting_for_gp_ttrace)
> will become a loop, so reused element will be at the very end
> instead of top, so one cpu to million times slower which is not
> realistic.

It is still possible A will be added back as
waiting_for_gp_ttrace->first after switching to llist_add() as shown
below. My questions is how much is the benefit for reusing from
waiting_for_gp_ttrace ?

    // free A (from c1), ..., last free X (allocated from c0) 
    P3: unit_free(c1)
        // the last freed element X is allocated from c0
        c1->tgt = c0
        c1->free_llist->first -> A -> ... -> Y
        c1->free_llist_extra -> X

    P3: free_bulk(c1)
        enque_to_free(c0) 
            c0->free_by_rcu_ttrace->first -> Y -> ... A
            c0->free_by_rcu_ttrace->first -> X -> Y -> ... A

        llist_add(c0->waiting_for_gp_ttrace)
            c0->waiting_for_gp_ttrace = A -> .. -> Y -> X

>
>> P1:
>>      // A is added back as first again
>>      // but llist_del_first() didn't know
>>      try_cmpxhg(&c0->waiting_for_gp_ttrace->first, A, B)
>>      // c0->waiting_for_gp_trrace is corrupted
>>      c0->waiting_for_gp_ttrace->first = B
>>