netdev - Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKhg4tJkprS+dFcpLALP_e1kpHJ-DwabOMFaXxsPx+7O0c-geQ@mail.gmail.com>
Date: Thu, 29 Jun 2023 20:17:23 +0800
From: Liang Chen <liangchen.linux@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: ilias.apalodimas@...aro.org, hawk@...nel.org, kuba@...nel.org, 
	davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com, 
	netdev@...r.kernel.org
Subject: Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case

On Thu, Jun 29, 2023 at 2:53 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2023/6/28 20:11, Liang Chen wrote:
> > In order to address the issues encountered with commit 1effe8ca4e34
> > ("skbuff: fix coalescing for page_pool fragment recycling"), the
> > combination of the following condition was excluded from skb coalescing:
> >
> > from->pp_recycle = 1
> > from->cloned = 1
> > to->pp_recycle = 1
> >
> > However, with page pool environments, the aforementioned combination can
> > be quite common. In scenarios with a higher number of small packets, it
> > can significantly affect the success rate of coalescing. For example,
> > when considering packets of 256 bytes size, our comparison of coalescing
> > success rate is as follows:
>
> As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.
>
> Could you give more detailed about the testing when we have a non-cloned
> 'to' skb and a cloned 'from' skb? As both of them should be belong to the
> same flow.
>
> I had the below patchset trying to do something similar as this patch does:
> https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/
>
> It seems this patch is only trying to optimize a specific case for skb
> coalescing, So if skb coalescing between non-cloned and cloned skb is a
> common case, then it might worth optimizing.
>

Sure, Thanks for the information! The testing is just a common iperf
test as below.

iperf3 -c <server IP> -i 5 -f g -t 0 -l 128

We observed the frequency of each combination of the pp (page pool)
and clone condition when entering skb_try_coalesce. The results
motivated us to propose such an optimization, as we noticed that case
11 (from pp/clone=1/1 and to pp/clone = 1/0) occurs quite often.

+-------------+--------------+--------------+--------------+--------------+
|   from/to   | pp/clone=0/0 | pp/clone=0/1 | pp/clone=1/0 | pp/clone=1/1 |
+-------------+--------------+--------------+--------------+--------------+
|pp/clone=0/0 | 0            | 1            | 2            | 3            |
|pp/clone=0/1 | 4            | 5            | 6            | 7            |
|pp/clone=1/0 | 8            | 9            | 10           | 11           |
|pp/clone=1/1 | 12           | 13           | 14           | 15           |
|+------------+--------------+--------------+--------------+--------------+


packet size 128:
total : 152903
0 : 0            (0%)
1 : 0            (0%)
2 : 0            (0%)
3 : 0            (0%)
4 : 0            (0%)
5 : 0            (0%)
6 : 0            (0%)
7 : 0            (0%)
8 : 0            (0%)
9 : 0            (0%)
10 : 20681       (13%)
11 : 90136       (58%)
12 : 0           (0%)
13 : 0           (0%)
14 : 0           (0%)
15 : 42086       (27%)

Thanks,
Liang


>
> >
> > Without page pool: 70%
> > With page pool: 13%
> >
>
> ...
>
> > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > index 126f9e294389..05e5d8ead63b 100644
> > --- a/include/net/page_pool.h
> > +++ b/include/net/page_pool.h
> > @@ -399,4 +399,25 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
> >               page_pool_update_nid(pool, new_nid);
> >  }
> >
> > +static inline bool page_pool_is_pp_page(struct page *page)
> > +{
> > +     return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
> > +}
> > +
> > +static inline bool page_pool_is_pp_page_frag(struct page *page)> +{
> > +     return !!(page->pp->p.flags & PP_FLAG_PAGE_FRAG);
> > +}
> > +
> > +static inline void page_pool_page_ref(struct page *page)
> > +{
> > +     struct page *head_page = compound_head(page);
>
> It seems we could avoid adding head_page here:
> page = compound_head(page);
>
> > +
> > +     if (page_pool_is_pp_page(head_page) &&
> > +                     page_pool_is_pp_page_frag(head_page))
> > +             atomic_long_inc(&head_page->pp_frag_count);
> > +     else
> > +             get_page(head_page);
>
> page_ref_inc() should be enough here instead of get_page()
> as compound_head() have been called.
>
> > +}
> > +
> >  #endif /* _NET_PAGE_POOL_H */
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6c5915efbc17..9806b091f0f6 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5666,8 +5666,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> >        * !@to->pp_recycle but its tricky (due to potential race with
> >        * the clone disappearing) and rare, so not worth dealing with.
> >        */
> > -     if (to->pp_recycle != from->pp_recycle ||
> > -         (from->pp_recycle && skb_cloned(from)))
> > +     if (to->pp_recycle != from->pp_recycle)
> >               return false;
> >
> >       if (len <= skb_tailroom(to)) {
> > @@ -5724,8 +5723,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> >       /* if the skb is not cloned this does nothing
> >        * since we set nr_frags to 0.
> >        */
> > -     for (i = 0; i < from_shinfo->nr_frags; i++)
> > -             __skb_frag_ref(&from_shinfo->frags[i]);
> > +     if (from->pp_recycle)
> > +             for (i = 0; i < from_shinfo->nr_frags; i++)
> > +                     page_pool_page_ref(skb_frag_page(&from_shinfo->frags[i]));
> > +     else
> > +             for (i = 0; i < from_shinfo->nr_frags; i++)
> > +                     __skb_frag_ref(&from_shinfo->frags[i]);
> >
> >       to->truesize += delta;
> >       to->len += len;
> >