lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 May 2021 12:36:39 +0300
From:   Ilias Apalodimas <ilias.apalodimas@...aro.org>
To:     Yunsheng Lin <linyunsheng@...wei.com>
Cc:     Matteo Croce <mcroce@...ux.microsoft.com>, netdev@...r.kernel.org,
        linux-mm@...ck.org, Ayush Sawal <ayush.sawal@...lsio.com>,
        Vinay Kumar Yadav <vinay.yadav@...lsio.com>,
        Rohit Maheshwari <rohitm@...lsio.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Marcin Wojtas <mw@...ihalf.com>,
        Russell King <linux@...linux.org.uk>,
        Mirko Lindner <mlindner@...vell.com>,
        Stephen Hemminger <stephen@...workplumber.org>,
        Tariq Toukan <tariqt@...dia.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        Boris Pismenny <borisp@...dia.com>,
        Arnd Bergmann <arnd@...db.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Yu Zhao <yuzhao@...gle.com>,
        Will Deacon <will@...nel.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Roman Gushchin <guro@...com>, Hugh Dickins <hughd@...gle.com>,
        Peter Xu <peterx@...hat.com>, Jason Gunthorpe <jgg@...pe.ca>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Alexander Lobakin <alobakin@...me>,
        Cong Wang <cong.wang@...edance.com>, wenxu <wenxu@...oud.cn>,
        Kevin Hao <haokexin@...il.com>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Marco Elver <elver@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Guillaume Nault <gnault@...hat.com>,
        linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
        bpf@...r.kernel.org, Matthew Wilcox <willy@...radead.org>,
        Eric Dumazet <edumazet@...gle.com>,
        David Ahern <dsahern@...il.com>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Saeed Mahameed <saeedm@...dia.com>,
        Andrew Lunn <andrew@...n.ch>, Paolo Abeni <pabeni@...hat.com>,
        Sven Auhagen <sven.auhagen@...eatech.de>
Subject: Re: [PATCH net-next v5 3/5] page_pool: Allow drivers to hint on SKB
 recycling

> >>

[...]

> >> In that case, the skb_mark_for_recycle() could only set the skb->pp_recycle,
> >> but not the pool->p.
> >>
> >>>
> >>>>
> >>>> Another thing accured to me is that if the driver use page from the
> >>>> page pool to form a skb, and it does not call skb_mark_for_recycle(),
> >>>> then there will be resource leaking, right? if yes, it seems the
> >>>> skb_mark_for_recycle() call does not seems to add any value?
> >>>>
> >>>
> >>> Not really, the driver has 2 choices:
> >>> - call page_pool_release_page() once it receives the payload. That will
> >>>   clean up dma mappings (if page pool is responsible for them) and free the
> >>>   buffer
> >>
> >> The is only needed before SKB recycling is supported or the driver does not
> >> want the SKB recycling support explicitly, right?
> >>
> > 
> > This is needed in general even before recycling.  It's used to unmap the
> > buffer, so once you free the SKB you don't leave any stale DMA mappings.  So
> > that's what all the drivers that use page_pool call today.
> 
> As my understanding:
> 1. If the driver is using page allocated from page allocator directly to
>    form a skb, let's say the page is owned by skb(or not owned by anyone:)),
>    when a skb is freed, the put_page() should be called.
> 
> 2. If the driver is using page allocated from page pool to form a skb, let's
>    say the page is owned by page pool, when a skb is freed, page_pool_put_page()
>    should be called.
> 
> What page_pool_release_page() mainly do is to make page in case 2 return back
> to case 1.

Yea but this is done deliberately.  Let me try to explain the reasoning a
bit.  I don't think mixing the SKB path with page_pool is the right idea. 
page_pool allocates the memory you want to build an SKB and imho it must be 
kept completely disjoint with the generic SKB code.  So once you free an SKB,
I don't like having page_pool_put_page() in the release code explicitly.  
What we do instead is call page_pool_release_page() from the driver.  So the 
page is disconnected from page pool and the skb release path works as it used 
to.

> 
> And page_pool_release_page() is replaced with skb_mark_for_recycle() in patch
> 4/5 to avoid the above "case 2" -> "case 1" changing, so that the page is still
> owned by page pool, right?
> 
> So the point is that skb_mark_for_recycle() does not really do anything about
> the owner of the page, it is still owned by page pool, so it makes more sense
> to keep the page pool ptr instead of setting it every time when
> skb_mark_for_recycle() is called?

Yes it doesn't do anything wrt to ownership.  The page must always come
from page pool if you want to recycle it. But as I tried to explain above,
it felt more intuitive to keep the driver flow as-is as well as  the
release path.  On a driver right now when you are done with the skb creation, 
you unmap the skb->head + fragments.  So if you want to recycle it it instead, 
you mark the skb and fragments.

> 
> > 
> >>> - call skb_mark_for_recycle(). Which will end up recycling the buffer.
> >>
> >> If the driver need to add extra flag to enable recycling based on skb
> >> instead of page pool, then adding skb_mark_for_recycle() makes sense to
> >> me too, otherwise it seems adding a field in pool->p to recycling based
> >> on skb makes more sense?
> >>
> > 
> > The recycling is essentially an SKB feature though isn't it?  You achieve the
> > SKB recycling with the help of page_pool API, not the other way around.  So I
> > think this should remain on the SKB and maybe in the future find ways to turn
> > in on/off?
> 
> As above, does it not make more sense to call page_pool_release_page() if the
> driver does not need the SKB recycling?

Call it were? As i tried to explain it makes no sense to me having it in
generic SKB code (unless recycling is enabled).

That's what's happening right now when recycling is enabled.
Basically the call path is:
if (skb bit is set) {
	if (page signature matches)
		page_pool_put_full_page() 
}
page_pool_put_full_page() will either:
1. recycle the page in the 'fast cache' of page pool
2. recycle the page in the ptr ring of page pool
3. Release it calling page_pool_release_page()

If you don't want to enable it you just call page_pool_release_page() on
your driver and the generic path will free the allocated page.

> 
> Even if when skb->pp_recycle is 1, pages allocated from page allocator directly
> or page pool are both supported, so it seems page->signature need to be reliable
> to indicate a page is indeed owned by a page pool, which means the skb->pp_recycle
> is used mainly to short cut the code path for skb->pp_recycle is 0 case, so that
> the page->signature does not need checking?

Yes, the idea for the recycling bit, is that you don't have to fetch the page
in cache do do more processing (since freeing is asynchronous and we
can't have any guarantees on what the cache will have at that point).  So we
are trying to affect the existing release path a less as possible. However it's
that new skb bit that triggers the whole path.

What you propose could still be doable though.  As you said we can add the
page pointer to struct page when we allocate a page_pool page and never
reset it when we recycle the buffer. But I don't think there will be any
performance impact whatsoever. So I prefer the 'visible' approach, at least for
the first iteration.

Thanks
/Ilias
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ