[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKhg4t+RUeoTv_OnD5nMAXWeATqRC+tcyzbnz_jXBQGzd90rpQ@mail.gmail.com>
Date: Thu, 29 Jun 2023 20:19:52 +0800
From: Liang Chen <liangchen.linux@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: ilias.apalodimas@...aro.org, hawk@...nel.org, kuba@...nel.org,
davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
netdev@...r.kernel.org
Subject: Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case
On Thu, Jun 29, 2023 at 8:17 PM Liang Chen <liangchen.linux@...il.com> wrote:
>
> On Thu, Jun 29, 2023 at 2:53 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >
> > On 2023/6/28 20:11, Liang Chen wrote:
> > > In order to address the issues encountered with commit 1effe8ca4e34
> > > ("skbuff: fix coalescing for page_pool fragment recycling"), the
> > > combination of the following condition was excluded from skb coalescing:
> > >
> > > from->pp_recycle = 1
> > > from->cloned = 1
> > > to->pp_recycle = 1
> > >
> > > However, with page pool environments, the aforementioned combination can
> > > be quite common. In scenarios with a higher number of small packets, it
> > > can significantly affect the success rate of coalescing. For example,
> > > when considering packets of 256 bytes size, our comparison of coalescing
> > > success rate is as follows:
> >
> > As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.
> >
> > Could you give more detailed about the testing when we have a non-cloned
> > 'to' skb and a cloned 'from' skb? As both of them should be belong to the
> > same flow.
> >
> > I had the below patchset trying to do something similar as this patch does:
> > https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/
> >
> > It seems this patch is only trying to optimize a specific case for skb
> > coalescing, So if skb coalescing between non-cloned and cloned skb is a
> > common case, then it might worth optimizing.
> >
>
> Sure, Thanks for the information! The testing is just a common iperf
> test as below.
>
> iperf3 -c <server IP> -i 5 -f g -t 0 -l 128
>
> We observed the frequency of each combination of the pp (page pool)
> and clone condition when entering skb_try_coalesce. The results
> motivated us to propose such an optimization, as we noticed that case
> 11 (from pp/clone=1/1 and to pp/clone = 1/0) occurs quite often.
>
> +-------------+--------------+--------------+--------------+--------------+
> | from/to | pp/clone=0/0 | pp/clone=0/1 | pp/clone=1/0 | pp/clone=1/1 |
> +-------------+--------------+--------------+--------------+--------------+
> |pp/clone=0/0 | 0 | 1 | 2 | 3 |
> |pp/clone=0/1 | 4 | 5 | 6 | 7 |
> |pp/clone=1/0 | 8 | 9 | 10 | 11 |
> |pp/clone=1/1 | 12 | 13 | 14 | 15 |
> |+------------+--------------+--------------+--------------+--------------+
>
>
> packet size 128:
> total : 152903
> 0 : 0 (0%)
> 1 : 0 (0%)
> 2 : 0 (0%)
> 3 : 0 (0%)
> 4 : 0 (0%)
> 5 : 0 (0%)
> 6 : 0 (0%)
> 7 : 0 (0%)
> 8 : 0 (0%)
> 9 : 0 (0%)
> 10 : 20681 (13%)
> 11 : 90136 (58%)
> 12 : 0 (0%)
> 13 : 0 (0%)
> 14 : 0 (0%)
> 15 : 42086 (27%)
>
> Thanks,
> Liang
>
>
> >
> > >
> > > Without page pool: 70%
> > > With page pool: 13%
> > >
> >
> > ...
> >
> > > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > > index 126f9e294389..05e5d8ead63b 100644
> > > --- a/include/net/page_pool.h
> > > +++ b/include/net/page_pool.h
> > > @@ -399,4 +399,25 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
> > > page_pool_update_nid(pool, new_nid);
> > > }
> > >
> > > +static inline bool page_pool_is_pp_page(struct page *page)
> > > +{
> > > + return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
> > > +}
> > > +
> > > +static inline bool page_pool_is_pp_page_frag(struct page *page)> +{
> > > + return !!(page->pp->p.flags & PP_FLAG_PAGE_FRAG);
> > > +}
> > > +
> > > +static inline void page_pool_page_ref(struct page *page)
> > > +{
> > > + struct page *head_page = compound_head(page);
> >
> > It seems we could avoid adding head_page here:
> > page = compound_head(page);
> >
Sure.
> > > +
> > > + if (page_pool_is_pp_page(head_page) &&
> > > + page_pool_is_pp_page_frag(head_page))
> > > + atomic_long_inc(&head_page->pp_frag_count);
> > > + else
> > > + get_page(head_page);
> >
> > page_ref_inc() should be enough here instead of get_page()
> > as compound_head() have been called.
> >
Yeah, it will be changed to page_ref_inc on v2.
> > > +}
> > > +
> > > #endif /* _NET_PAGE_POOL_H */
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 6c5915efbc17..9806b091f0f6 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -5666,8 +5666,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > > * !@to->pp_recycle but its tricky (due to potential race with
> > > * the clone disappearing) and rare, so not worth dealing with.
> > > */
> > > - if (to->pp_recycle != from->pp_recycle ||
> > > - (from->pp_recycle && skb_cloned(from)))
> > > + if (to->pp_recycle != from->pp_recycle)
> > > return false;
> > >
> > > if (len <= skb_tailroom(to)) {
> > > @@ -5724,8 +5723,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > > /* if the skb is not cloned this does nothing
> > > * since we set nr_frags to 0.
> > > */
> > > - for (i = 0; i < from_shinfo->nr_frags; i++)
> > > - __skb_frag_ref(&from_shinfo->frags[i]);
> > > + if (from->pp_recycle)
> > > + for (i = 0; i < from_shinfo->nr_frags; i++)
> > > + page_pool_page_ref(skb_frag_page(&from_shinfo->frags[i]));
> > > + else
> > > + for (i = 0; i < from_shinfo->nr_frags; i++)
> > > + __skb_frag_ref(&from_shinfo->frags[i]);
> > >
> > > to->truesize += delta;
> > > to->len += len;
> > >
Powered by blists - more mailing lists