netdev - Re: [Linuxarm] Re: [PATCH net-next v2 3/3] skbuff: keep track of pp page when __skb_frag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0c59b17a-4bc7-9082-1362-77256bec9abe@huawei.com>
Date:   Sat, 18 Sep 2021 10:42:00 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Eric Dumazet <edumazet@...gle.com>
CC:     Jesper Dangaard Brouer <jbrouer@...hat.com>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, <linuxarm@...neuler.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        "Jonathan Lemon" <jonathan.lemon@...il.com>,
        Alexander Lobakin <alobakin@...me>,
        "Willem de Bruijn" <willemb@...gle.com>,
        Cong Wang <cong.wang@...edance.com>,
        "Paolo Abeni" <pabeni@...hat.com>, Kevin Hao <haokexin@...il.com>,
        Aleksandr Nogikh <nogikh@...gle.com>,
        Marco Elver <elver@...gle.com>, <memxor@...il.com>,
        David Ahern <dsahern@...il.com>
Subject: Re: [Linuxarm] Re: [PATCH net-next v2 3/3] skbuff: keep track of pp
 page when __skb_frag_ref() is called

On 2021/9/18 1:15, Eric Dumazet wrote:
> On Wed, Sep 15, 2021 at 7:05 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> 
>> As memtioned before, Tx recycling is based on page_pool instance per socket.
>> it shares the page_pool instance with rx.
>>
>> Anyway, based on feedback from edumazet and dsahern, I am still trying to
>> see if the page pool is meaningful for tx.
>>
> 
> It is not for generic linux TCP stack, but perhaps for benchmarks.

I am not sure I understand what does above means, did you mean
tx recycling only benefit the benchmark tool, such as iperf/netperf,
but not the real usecase?

> 
> Unless you dedicate one TX/RX pair per TCP socket ?

TX/RX pair for netdev queue or TX/RX pair for recycling pool?

As the TX/RX pair for netdev queue, I am not dedicating one TX/RX
pair netdev queue per TCP socket.

As the TX/RX pair for recycling pool, my initial thinking is each
NAPI/socket context have a 'struct pp_alloc_cache', which provides
last-in-first-out and lockless mini pool specific to each NAPI/socket
context, and a central locked 'struct ptr_ring' pool based on queue
for all the NAPI/socket mini pools, when a NAPI/socket context's
mini pool is empty or full, it can refill some page from the central
pool or flush some page to the central pool.

I am not sure if the locked central pool is needed or not, or the
'struct ptr_ring' of page pool is right one to be the locked central
pool yet.

> 
> Most high performance TCP flows are using zerocopy, I am really not
> sure why we would
> need to 'optimize' the path that is wasting cpu cycles doing
> user->kernel copies anyway,
> at the cost of insane complexity.

As my understanding, zerocopy is mostly about big packet and non-IOMMU
case.

As complexity, I am not convinced yet that it is that complex, as it is
mostly using the existing infrastructure to support tx recycling.

The point is that most of skb is freed in the context of NAPI or socket,
it seems we may utilize that to do batch allocating/freeing of skb/page_frag,
or reusing of skb/page_frag/dma mapping to avoid (IO/CPU)TLB miss, cache miss,
overhead of spinlock and dma mapping.

> .
>