lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c59b17a-4bc7-9082-1362-77256bec9abe@huawei.com>
Date:   Sat, 18 Sep 2021 10:42:00 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Eric Dumazet <edumazet@...gle.com>
CC:     Jesper Dangaard Brouer <jbrouer@...hat.com>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, <linuxarm@...neuler.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        "Jonathan Lemon" <jonathan.lemon@...il.com>,
        Alexander Lobakin <alobakin@...me>,
        "Willem de Bruijn" <willemb@...gle.com>,
        Cong Wang <cong.wang@...edance.com>,
        "Paolo Abeni" <pabeni@...hat.com>, Kevin Hao <haokexin@...il.com>,
        Aleksandr Nogikh <nogikh@...gle.com>,
        Marco Elver <elver@...gle.com>, <memxor@...il.com>,
        David Ahern <dsahern@...il.com>
Subject: Re: [Linuxarm] Re: [PATCH net-next v2 3/3] skbuff: keep track of pp
 page when __skb_frag_ref() is called

On 2021/9/18 1:15, Eric Dumazet wrote:
> On Wed, Sep 15, 2021 at 7:05 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> 
>> As memtioned before, Tx recycling is based on page_pool instance per socket.
>> it shares the page_pool instance with rx.
>>
>> Anyway, based on feedback from edumazet and dsahern, I am still trying to
>> see if the page pool is meaningful for tx.
>>
> 
> It is not for generic linux TCP stack, but perhaps for benchmarks.

I am not sure I understand what does above means, did you mean
tx recycling only benefit the benchmark tool, such as iperf/netperf,
but not the real usecase?

> 
> Unless you dedicate one TX/RX pair per TCP socket ?

TX/RX pair for netdev queue or TX/RX pair for recycling pool?

As the TX/RX pair for netdev queue, I am not dedicating one TX/RX
pair netdev queue per TCP socket.

As the TX/RX pair for recycling pool, my initial thinking is each
NAPI/socket context have a 'struct pp_alloc_cache', which provides
last-in-first-out and lockless mini pool specific to each NAPI/socket
context, and a central locked 'struct ptr_ring' pool based on queue
for all the NAPI/socket mini pools, when a NAPI/socket context's
mini pool is empty or full, it can refill some page from the central
pool or flush some page to the central pool.

I am not sure if the locked central pool is needed or not, or the
'struct ptr_ring' of page pool is right one to be the locked central
pool yet.

> 
> Most high performance TCP flows are using zerocopy, I am really not
> sure why we would
> need to 'optimize' the path that is wasting cpu cycles doing
> user->kernel copies anyway,
> at the cost of insane complexity.

As my understanding, zerocopy is mostly about big packet and non-IOMMU
case.

As complexity, I am not convinced yet that it is that complex, as it is
mostly using the existing infrastructure to support tx recycling.

The point is that most of skb is freed in the context of NAPI or socket,
it seems we may utilize that to do batch allocating/freeing of skb/page_frag,
or reusing of skb/page_frag/dma mapping to avoid (IO/CPU)TLB miss, cache miss,
overhead of spinlock and dma mapping.


> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ