netdev - Re: [PATCH net-next v7 4/4] skbuff: Optimization of SKB coalescing for page pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izNQeSwWQ9NwiDUcPoSX1WONG4JYu2rfpqF3+4xkxE=Wyw@mail.gmail.com>
Date: Fri, 8 Dec 2023 18:18:42 -0800
From: Mina Almasry <almasrymina@...gle.com>
To: Liang Chen <liangchen.linux@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, hawk@...nel.org, ilias.apalodimas@...aro.org, 
	linyunsheng@...wei.com, netdev@...r.kernel.org, linux-mm@...ck.org, 
	jasowang@...hat.com
Subject: Re: [PATCH net-next v7 4/4] skbuff: Optimization of SKB coalescing
 for page pool

On Wed, Dec 6, 2023 at 2:54 AM Liang Chen <liangchen.linux@...il.com> wrote:
>
> In order to address the issues encountered with commit 1effe8ca4e34
> ("skbuff: fix coalescing for page_pool fragment recycling"), the
> combination of the following condition was excluded from skb coalescing:
>
> from->pp_recycle = 1
> from->cloned = 1
> to->pp_recycle = 1
>
> However, with page pool environments, the aforementioned combination can
> be quite common(ex. NetworkMananger may lead to the additional
> packet_type being registered, thus the cloning). In scenarios with a
> higher number of small packets, it can significantly affect the success
> rate of coalescing. For example, considering packets of 256 bytes size,
> our comparison of coalescing success rate is as follows:
>
> Without page pool: 70%
> With page pool: 13%
>
> Consequently, this has an impact on performance:
>
> Without page pool: 2.57 Gbits/sec
> With page pool: 2.26 Gbits/sec
>
> Therefore, it seems worthwhile to optimize this scenario and enable
> coalescing of this particular combination. To achieve this, we need to
> ensure the correct increment of the "from" SKB page's page pool
> reference count (pp_ref_count).
>
> Following this optimization, the success rate of coalescing measured in
> our environment has improved as follows:
>
> With page pool: 60%
>
> This success rate is approaching the rate achieved without using page
> pool, and the performance has also been improved:
>
> With page pool: 2.52 Gbits/sec
>
> Below is the performance comparison for small packets before and after
> this optimization. We observe no impact to packets larger than 4K.
>
> packet size     before      after       improved
> (bytes)         (Gbits/sec) (Gbits/sec)
> 128             1.19        1.27        7.13%
> 256             2.26        2.52        11.75%
> 512             4.13        4.81        16.50%
> 1024            6.17        6.73        9.05%
> 2048            14.54       15.47       6.45%
> 4096            25.44       27.87       9.52%
>
> Signed-off-by: Liang Chen <liangchen.linux@...il.com>
> Reviewed-by: Yunsheng Lin <linyunsheng@...wei.com>
> Suggested-by: Jason Wang <jasowang@...hat.com>
> ---
>  include/net/page_pool/helpers.h |  5 ++++
>  net/core/skbuff.c               | 41 +++++++++++++++++++++++----------
>  2 files changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 9dc8eaf8a959..268bc9d9ffd3 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -278,6 +278,11 @@ static inline long page_pool_unref_page(struct page *page, long nr)
>         return ret;
>  }
>
> +static inline void page_pool_ref_page(struct page *page)
> +{
> +       atomic_long_inc(&page->pp_ref_count);
> +}
> +
>  static inline bool page_pool_is_last_ref(struct page *page)
>  {
>         /* If page_pool_unref_page() returns 0, we were the last user */
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7e26b56cda38..3c2515a29376 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -947,6 +947,24 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe)
>         return napi_pp_put_page(virt_to_page(data), napi_safe);
>  }
>
> +/**
> + * skb_pp_frag_ref() - Increase fragment reference count of a page
> + * @page:      page of the fragment on which to increase a reference
> + *
> + * Increase fragment reference count (pp_ref_count) on a page, but if it is
> + * not a page pool page, fallback to increase a reference(_refcount) on a
> + * normal page.
> + */
> +static void skb_pp_frag_ref(struct page *page)
> +{
> +       struct page *head_page = compound_head(page);
> +
> +       if (likely(is_pp_page(head_page)))
> +               page_pool_ref_page(head_page);
> +       else
> +               page_ref_inc(head_page);
> +}
> +

I am confused by this, why add a new helper instead of modifying the
existing helper, skb_frag_ref()?

My mental model is that if the net stack wants to acquire a reference
on a frag, it calls skb_frag_ref(), and if it wants to drop a
reference on a frag, it should call skb_frag_unref(). Internally
skb_frag_ref/unref() can do all sorts of checking to decide whether to
increment page->refcount or page->pp_ref_count. I can't wrap my head
around the introduction of skb_pp_frag_ref(), but no equivalent
skb_pp_frag_unref().

But even if skb_pp_frag_unref() was added, when should the net stack
use skb_frag_ref/unref, and when should the stack use
skb_pp_ref/unref? The docs currently describe what the function does,
but when a program unfamiliar with the page pool should use it.

>  static void skb_kfree_head(void *head, unsigned int end_offset)
>  {
>         if (end_offset == SKB_SMALL_HEAD_HEADROOM)
> @@ -5769,17 +5787,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>                 return false;
>
>         /* In general, avoid mixing page_pool and non-page_pool allocated
> -        * pages within the same SKB. Additionally avoid dealing with clones
> -        * with page_pool pages, in case the SKB is using page_pool fragment
> -        * references (page_pool_alloc_frag()). Since we only take full page
> -        * references for cloned SKBs at the moment that would result in
> -        * inconsistent reference counts.
> -        * In theory we could take full references if @from is cloned and
> -        * !@to->pp_recycle but its tricky (due to potential race with
> -        * the clone disappearing) and rare, so not worth dealing with.
> +        * pages within the same SKB. In theory we could take full
> +        * references if @from is cloned and !@to->pp_recycle but its
> +        * tricky (due to potential race with the clone disappearing) and
> +        * rare, so not worth dealing with.
>          */
> -       if (to->pp_recycle != from->pp_recycle ||
> -           (from->pp_recycle && skb_cloned(from)))
> +       if (to->pp_recycle != from->pp_recycle)
>                 return false;
>
>         if (len <= skb_tailroom(to)) {
> @@ -5836,8 +5849,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>         /* if the skb is not cloned this does nothing
>          * since we set nr_frags to 0.
>          */
> -       for (i = 0; i < from_shinfo->nr_frags; i++)
> -               __skb_frag_ref(&from_shinfo->frags[i]);
> +       if (from->pp_recycle)
> +               for (i = 0; i < from_shinfo->nr_frags; i++)
> +                       skb_pp_frag_ref(skb_frag_page(&from_shinfo->frags[i]));
> +       else
> +               for (i = 0; i < from_shinfo->nr_frags; i++)
> +                       __skb_frag_ref(&from_shinfo->frags[i]);

You added a check here to use skb_pp_frag_ref() instead of
skb_frag_ref() here, but it's not clear to me why other callsites of
skb_frag_ref() don't need to be modified in the same way after your
patch.

After your patch:

skb_frag_ref() will always increment page->_refcount
skb_frag_unref() will either decrement page->_refcount or decrement
page->pp_ref_count (depending on the value of skb->pp_recycle).
skb_pp_frag_ref() will either increment page->_refcount or increment
page->pp_ref_count (depending on the value of is_pp_page(), not
skb->pp_recycle).
skb_pp_frag_unref() doesn't exist.

Is this not confusing? Can we streamline things:

skb_frag_ref() increments page->pp_ref_count for skb->pp_recycle,
page->_refcount otherwise.
skb_frag_unref() decrement page->pp_ref_count for skb->pp_recycle,
page->_refcount otherwise.

Or am I missing something that causes us to require this asymmetric
reference counting?

>
>         to->truesize += delta;
>         to->len += len;
> --
> 2.31.1
>
>


-- 
Thanks,
Mina