lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Apr 2021 11:01:48 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Ilias Apalodimas <ilias.apalodimas@...aro.org>
CC:     Matteo Croce <mcroce@...ux.microsoft.com>,
        <netdev@...r.kernel.org>, <linux-mm@...ck.org>,
        Ayush Sawal <ayush.sawal@...lsio.com>,
        "Vinay Kumar Yadav" <vinay.yadav@...lsio.com>,
        Rohit Maheshwari <rohitm@...lsio.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Marcin Wojtas <mw@...ihalf.com>,
        Russell King <linux@...linux.org.uk>,
        Mirko Lindner <mlindner@...vell.com>,
        Stephen Hemminger <stephen@...workplumber.org>,
        "Tariq Toukan" <tariqt@...dia.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        "Alexei Starovoitov" <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        "John Fastabend" <john.fastabend@...il.com>,
        Boris Pismenny <borisp@...dia.com>,
        Arnd Bergmann <arnd@...db.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Yu Zhao <yuzhao@...gle.com>,
        Will Deacon <will@...nel.org>,
        Michel Lespinasse <walken@...gle.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Roman Gushchin <guro@...com>, Hugh Dickins <hughd@...gle.com>,
        Peter Xu <peterx@...hat.com>, Jason Gunthorpe <jgg@...pe.ca>,
        Guoqing Jiang <guoqing.jiang@...ud.ionos.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Alexander Lobakin <alobakin@...me>,
        Cong Wang <cong.wang@...edance.com>, wenxu <wenxu@...oud.cn>,
        Kevin Hao <haokexin@...il.com>,
        Aleksandr Nogikh <nogikh@...gle.com>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Marco Elver <elver@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Guillaume Nault <gnault@...hat.com>,
        <linux-kernel@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
        <bpf@...r.kernel.org>, Matthew Wilcox <willy@...radead.org>,
        Eric Dumazet <edumazet@...gle.com>,
        David Ahern <dsahern@...il.com>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Saeed Mahameed <saeedm@...dia.com>,
        Andrew Lunn <andrew@...n.ch>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next v3 0/5] page_pool: recycle buffers

On 2021/4/30 2:51, Ilias Apalodimas wrote:
> Hi Yunsheng,
> 
> On Thu, Apr 29, 2021 at 04:27:21PM +0800, Yunsheng Lin wrote:
>> On 2021/4/10 6:37, Matteo Croce wrote:
>>> From: Matteo Croce <mcroce@...rosoft.com>

[...]

>>
>> 1. skb frag page recycling do not need "struct xdp_rxq_info" or
>>    "struct xdp_mem_info" to bond the relation between "struct page" and
>>    "struct page_pool", which seems uncessary at this point if bonding
>>    a "struct page_pool" pointer directly in "struct page" does not cause
>>    space increasing.
> 
> We can't do that. The reason we need those structs is that we rely on the
> existing XDP code, which already recycles it's buffers, to enable
> recycling.  Since we allocate a page per packet when using page_pool for a
> driver , the same ideas apply to an SKB and XDP frame. We just recycle the

I am not really familar with XDP here, but a packet from hw is either a
"struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack,
a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at
the same time, right?

What does not really make sense to me is that the page has to be from page
pool when a skb's frag page can be recycled, right? If it is ture, the switch
case in __xdp_return() does not really make sense for skb recycling, why go
all the trouble of checking the mem->type and mem->id to find the page_pool
pointer when recyclable page for skb can only be from page pool?

> payload and we don't really care what's in that.  We could rename the functions
> to something more generic in the future though ?
> 
>>
>> 2. it would be good to do the page reference count updating batching
>>    in page pool instead of specific driver.
>>
>>
>> page_pool_atomic_sub_if_positive() is added to decide who can call
>> page_pool_put_full_page(), because the driver and stack may hold
>> reference to the same page, only if last one which hold complete
>> reference to a page can call page_pool_put_full_page() to decide if
>> recycling is possible, if not, the page is released, so I am wondering
>> if a similar page_pool_atomic_sub_if_positive() can added to specific
>> user space address unmapping path to allow skb recycling for RX zerocopy
>> too?
>>
> 
> I would prefer a different page pool type if we wanted to support the split
> page model.  The changes as is are quite intrusive, since they change the 
> entire skb return path.  So I would prefer introducing the changes one at a 
> time. 

I understand there may be fundamental semantic change when split page model
is supported by page pool, but the split page support change mainly affect the
skb recycling path and the driver that uses page pool(XDP too) if we are careful
enough, not the entire skb return path as my understanding.

Anyway, one changes at a time is always prefered if supporting split page is
proved to be non-trivel and intrusive.

> 
> The fundamental difference between having the recycling in the driver vs
> having it in a generic API is pretty straightforward.  When a driver holds
> the extra page references he is free to decide what to reuse, when he is about
> to refill his Rx descriptors.  So TCP zerocopy might work even if the
> userspace applications hold the buffers for an X amount of time.
> On this proposal though we *need* to decide what to do with the buffer when we
> are about to free the skb.

I am not sure I understand what you meant by "free the skb", does it mean
that kfree_skb() is called to free the skb.

As my understanding, if the skb completely own the page(which means page_count()
== 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise
page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive()
try to handle it atomically.

> 
> [...]
> 
> 
> Cheers
> /Ilias
> 
> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ