lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210507121953.59e22aa8@carbon>
Date:   Fri, 7 May 2021 12:19:53 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Yunsheng Lin <linyunsheng@...wei.com>
Cc:     brouer@...hat.com, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Matteo Croce <mcroce@...ux.microsoft.com>,
        <netdev@...r.kernel.org>, <linux-mm@...ck.org>,
        Ayush Sawal <ayush.sawal@...lsio.com>,
        "Vinay Kumar Yadav" <vinay.yadav@...lsio.com>,
        Rohit Maheshwari <rohitm@...lsio.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Marcin Wojtas <mw@...ihalf.com>,
        Russell King <linux@...linux.org.uk>,
        Mirko Lindner <mlindner@...vell.com>,
        Stephen Hemminger <stephen@...workplumber.org>,
        "Tariq Toukan" <tariqt@...dia.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        "Alexei Starovoitov" <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        "John Fastabend" <john.fastabend@...il.com>,
        Boris Pismenny <borisp@...dia.com>,
        Arnd Bergmann <arnd@...db.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Yu Zhao <yuzhao@...gle.com>,
        Will Deacon <will@...nel.org>,
        Michel Lespinasse <walken@...gle.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Roman Gushchin <guro@...com>, Hugh Dickins <hughd@...gle.com>,
        Peter Xu <peterx@...hat.com>, Jason Gunthorpe <jgg@...pe.ca>,
        Guoqing Jiang <guoqing.jiang@...ud.ionos.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Alexander Lobakin <alobakin@...me>,
        Cong Wang <cong.wang@...edance.com>, wenxu <wenxu@...oud.cn>,
        Kevin Hao <haokexin@...il.com>,
        Aleksandr Nogikh <nogikh@...gle.com>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Marco Elver <elver@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Guillaume Nault <gnault@...hat.com>,
        <linux-kernel@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
        <bpf@...r.kernel.org>, Matthew Wilcox <willy@...radead.org>,
        Eric Dumazet <edumazet@...gle.com>,
        David Ahern <dsahern@...il.com>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Saeed Mahameed <saeedm@...dia.com>,
        Andrew Lunn <andrew@...n.ch>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next v3 0/5] page_pool: recycle buffers

On Fri, 7 May 2021 16:28:30 +0800
Yunsheng Lin <linyunsheng@...wei.com> wrote:

> On 2021/5/7 15:06, Ilias Apalodimas wrote:
> > On Fri, May 07, 2021 at 11:23:28AM +0800, Yunsheng Lin wrote:  
> >> On 2021/5/6 20:58, Ilias Apalodimas wrote:  
> >>>>>>  
> >>>>>
> >>>>> Not really, the opposite is happening here. If the pp_recycle bit is set we
> >>>>> will always call page_pool_return_skb_page().  If the page signature matches
> >>>>> the 'magic' set by page pool we will always call xdp_return_skb_frame() will
> >>>>> end up calling __page_pool_put_page(). If the refcnt is 1 we'll try
> >>>>> to recycle the page.  If it's not we'll release it from page_pool (releasing
> >>>>> some internal references we keep) unmap the buffer and decrement the refcnt.  
> >>>>
> >>>> Yes, I understood the above is what the page pool do now.
> >>>>
> >>>> But the question is who is still holding an extral reference to the page when
> >>>> kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral
> >>>> reference to the same page? So why not just do a page_ref_dec() if the orginal skb
> >>>> is freed first, and call __page_pool_put_page() when the cloned skb is freed later?
> >>>> So that we can always reuse the recyclable page from a recyclable skb. This may
> >>>> make the page_pool_destroy() process delays longer than before, I am supposed the
> >>>> page_pool_destroy() delaying for cloned skb case does not really matters here.
> >>>>
> >>>> If the above works, I think the samiliar handling can be added to RX zerocopy if
> >>>> the RX zerocopy also hold extral references to the recyclable page from a recyclable
> >>>> skb too?
> >>>>  
> >>>
> >>> Right, this sounds doable, but I'll have to go back code it and see if it
> >>> really makes sense.  However I'd still prefer the support to go in as-is
> >>> (including the struct xdp_mem_info in struct page, instead of a page_pool
> >>> pointer).
> >>>
> >>> There's a couple of reasons for that.  If we keep the struct xdp_mem_info we
> >>> can in the future recycle different kind of buffers using __xdp_return().
> >>> And this is a non intrusive change if we choose to store the page pool address
> >>> directly in the future.  It just affects the internal contract between the
> >>> page_pool code and struct page.  So it won't affect any drivers that already
> >>> use the feature.  
> >>
> >> This patchset has embeded a signature field in "struct page", and xdp_mem_info
> >> is stored in page_private(), which seems not considering the case for associating
> >> the page pool with "struct page" directly yet?   
> > 
> > Correct
> >   
> >> Is the page pool also stored in
> >> page_private() and a different signature is used to indicate that?  
> > 
> > No only struct xdp_mem_info as you mentioned before
> >   
> >>
> >> I am not saying we have to do it in this patchset, but we have to consider it
> >> while we are adding new signature field to "struct page", right?  
> > 
> > We won't need a new signature.  The signature in both cases is there to 
> > guarantee the page you are trying to recycle was indeed allocated by page_pool.
> > 
> > Basically we got two design choices here: 
> > - We store the page_pool ptr address directly in page->private and then,
> >   we call into page_pool APIs directly to do the recycling.
> >   That would eliminate the lookup through xdp_mem_info and the
> >   XDP helpers to locate page pool pointer (through __xdp_return).
> > - You store the xdp_mem_info on page_private.  In that case you need to go
> >   through __xdp_return()  to locate the page_pool pointer. Although we might
> >   loose some performance that would allow us to recycle additional memory types
> >   and not only MEM_TYPE_PAGE_POOL (in case we ever need it).  
>
> So the signature field  in "struct page" is used to only indicate a page is
> from a page pool, then how do we tell the content of page_private() if both of
> the above choices are needed, we might still need an extra indicator to tell
> page_private() is page_pool ptr or xdp_mem_info.

The signature field in "struct page" and "xdp_mem_info" is a double
construct that was introduced in this patchset.  AFAIK Matteo took the
idea from Jonathan's patchset.  I'm not convinced we need both, maybe
later we do (when someone introduce a new mem model ala NetGPU).

I think Jonathan's use-case was NetGPU[1] (which got shutdown due to
Nvidia[2] being involved which I think was unfair).  The general idea
behind NetGPU makes sense to me, to allow packet headers to live in
first page, and second page belongs to hardware.  This implies that an
SKB can can point to two different pages with different memory types,
which need to be handled correctly when freeing the SKB and the pages it
points to.  Thus, placing (xdp_)mem_info in SKB is wrong as it implies
all pages belong the same mem_info.type.

The point is, when designing this I want us to think about how our
design can handle other memory models than just page_pool.

In this patchset design, we use a single bit in SKB to indicate that
the pages pointed comes from another memory model, in this case
page_pool is the only user of this bit.  The remaining info about the
memory model (page_pool) is stored in struct-page, which we look at
when freeing the pages that the SKB points to (that is at the layer
above the MM-calls that would free the page for real).


[1] https://linuxplumbersconf.org/event/7/contributions/670/
[2] https://lwn.net/Articles/827596/

> It seems storing the page pool ptr in page_private() is clear for recyclable
> page from a recyclable skb use case, and the use case for storing xdp_mem_info
> in page_private() is unclear yet? As XDP seems to have the xdp_mem_info in the
> "struct xdp_frame", so it does not need the xdp_mem_info from page_private().
> 
> If the above is true, what does not really makes sense to me here is that:
> why do we first implement a unclear use case for storing xdp_mem_info in
> page_private(), why not implement the clear use case for storing page pool ptr
> in page_private() first?

I'm actually not against storing page_pool object ptr directly in
struct-page.  It is a nice optimization.  Perhaps we should implement
this optimization outside this patchset first, and let __xdp_return()
for XDP-redirected packets also take advantage to this optimization?

Then it would feel more natural to also used this optimization in this
patchset, right?

> > 
> > 
> > I think both choices are sane.  What I am trying to explain here, is
> > regardless of what we choose now, we can change it in the future without
> > affecting the API consumers at all.  What will change internally is the way we
> > lookup the page pool pointer we are trying to recycle.  
> 
> It seems the below API need changing?
> +static inline void skb_mark_for_recycle(struct sk_buff *skb, struct page *page,
> +					struct xdp_mem_info *mem)

I don't think we need to change this API, to support future memory
models.  Notice that xdp_mem_info have a 'type' member.

Naming in Computer Science is a hard problem ;-). Something that seems
to confuse a lot of people is the naming of the struct "xdp_mem_info".  
Maybe we should have named it "mem_info" instead or "net_mem_info", as
it doesn't indicate that the device is running XDP.

I see XDP as the RX-layer before the network stack, that helps drivers
to support different memory models, also for handling normal packets
that doesn't get process by XDP, and the drivers doesn't even need to
support XDP to use the "xdp_mem_info" type.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

struct xdp_mem_info {
	u32 type; /* enum xdp_mem_type, but known size type */
	u32 id;
};

enum xdp_mem_type {
	MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
	MEM_TYPE_PAGE_ORDER0,     /* Orig XDP full page model */
	MEM_TYPE_PAGE_POOL,
	MEM_TYPE_XSK_BUFF_POOL,
	MEM_TYPE_MAX,
};

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ