[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a1884884db58d09a31611a5ed4c4e0ab97dd35c4.camel@redhat.com>
Date: Wed, 28 Sep 2022 09:11:10 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: netdev <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Alexander H Duyck <alexanderduyck@...com>
Subject: Re: [PATCH net-next v3] net: skb: introduce and use a single page
frag cache
On Tue, 2022-09-27 at 21:05 -0700, Eric Dumazet wrote:
> On Fri, Sep 23, 2022 at 10:14 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > After commit 3226b158e67c ("net: avoid 32 x truesize under-estimation
> > for tiny skbs") we are observing 10-20% regressions in performance
> > tests with small packets. The perf trace points to high pressure on
> > the slab allocator.
> >
> > This change tries to improve the allocation schema for small packets
> > using an idea originally suggested by Eric: a new per CPU page frag is
> > introduced and used in __napi_alloc_skb to cope with small allocation
> > requests.
> >
> > To ensure that the above does not lead to excessive truesize
> > underestimation, the frag size for small allocation is inflated to 1K
> > and all the above is restricted to build with 4K page size.
> >
> > Note that we need to update accordingly the run-time check introduced
> > with commit fd9ea57f4e95 ("net: add napi_get_frags_check() helper").
> >
> > Alex suggested a smart page refcount schema to reduce the number
> > of atomic operations and deal properly with pfmemalloc pages.
> >
> > Under small packet UDP flood, I measure a 15% peak tput increases.
> >
> > Suggested-by: Eric Dumazet <eric.dumazet@...il.com>
> > Suggested-by: Alexander H Duyck <alexanderduyck@...com>
> > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > ---
> > v2 -> v3:
> > - updated Alex email address
> > - fixed build with !NAPI_HAS_SMALL_PAGE_FRAG
> >
> > v1 -> v2:
> > - better page_frag_alloc_1k() (Alex & Eric)
> > - avoid duplicate code and gfp_flags misuse in __napi_alloc_skb() (Alex)
> > ---
> > include/linux/netdevice.h | 1 +
> > net/core/dev.c | 17 ------
> > net/core/skbuff.c | 112 ++++++++++++++++++++++++++++++++++++--
> > 3 files changed, 108 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 9f42fc871c3b..a1938560192a 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3822,6 +3822,7 @@ void netif_receive_skb_list(struct list_head *head);
> > gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
> > void napi_gro_flush(struct napi_struct *napi, bool flush_old);
> > struct sk_buff *napi_get_frags(struct napi_struct *napi);
> > +void napi_get_frags_check(struct napi_struct *napi);
> > gro_result_t napi_gro_frags(struct napi_struct *napi);
> > struct packet_offload *gro_find_receive_by_type(__be16 type);
> > struct packet_offload *gro_find_complete_by_type(__be16 type);
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index d66c73c1c734..fa53830d0683 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6358,23 +6358,6 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
> > }
> > EXPORT_SYMBOL(dev_set_threaded);
> >
> > -/* Double check that napi_get_frags() allocates skbs with
> > - * skb->head being backed by slab, not a page fragment.
> > - * This is to make sure bug fixed in 3226b158e67c
> > - * ("net: avoid 32 x truesize under-estimation for tiny skbs")
> > - * does not accidentally come back.
> > - */
> > -static void napi_get_frags_check(struct napi_struct *napi)
> > -{
> > - struct sk_buff *skb;
> > -
> > - local_bh_disable();
> > - skb = napi_get_frags(napi);
> > - WARN_ON_ONCE(skb && skb->head_frag);
> > - napi_free_frags(napi);
> > - local_bh_enable();
> > -}
> > -
> > void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
> > int (*poll)(struct napi_struct *, int), int weight)
> > {
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index f1b8b20fc20b..e7578549a561 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -134,8 +134,73 @@ static void skb_under_panic(struct sk_buff *skb, unsigned int sz, void *addr)
> > #define NAPI_SKB_CACHE_BULK 16
> > #define NAPI_SKB_CACHE_HALF (NAPI_SKB_CACHE_SIZE / 2)
> >
> > +/* the compiler doesn't like 'SKB_TRUESIZE(GRO_MAX_HEAD) > 512', but we
> > + * can imply such condition checking the double word and MAX_HEADER size
> > + */
> > +#if PAGE_SIZE == SZ_4K && (defined(CONFIG_64BIT) || MAX_HEADER > 64)
>
> This looks quite convoluted to me.
> It hides the relation between GRO_MAX_HEAD and MAX_HEADER ?
> Is anyone going to notice if we consume 1K instead of 512 bytes on
> 32bit arches, and when LL_MAX_HEADER==32
> and no tunnel is enabled ?
Indeed I was a little doubious about the above, as I agree it's quite
convoluted. In the end I preferred a "better safe than sorry" approach,
but I agree that is possibly too much conservative.
> I would simply use
>
> #if PAGE_SIZE == SZ_4K
I'll do in the next revision.
> > +
> > +#define NAPI_HAS_SMALL_PAGE_FRAG 1
> > +#define NAPI_SMALL_PAGE_PFMEMALLOC(nc) ((nc).pfmemalloc)
> > +
> > +/* specializzed page frag allocator using a single order 0 page
>
> specialized
Thanks,
Paolo
Powered by blists - more mailing lists