[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230331151712.2ccd8317@kernel.org>
Date: Fri, 31 Mar 2023 15:17:12 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: davem@...emloft.net, brouer@...hat.com, netdev@...r.kernel.org,
edumazet@...gle.com, pabeni@...hat.com, hawk@...nel.org,
ilias.apalodimas@...aro.org
Subject: Re: [RFC net-next 1/2] page_pool: allow caching from safely
localized NAPI
On Fri, 31 Mar 2023 12:06:43 -0700 Jakub Kicinski wrote:
> > Make sense, but do read the comment above struct pp_alloc_cache.
> > The sizing of pp_alloc_cache is important for this trick/heuristic to
> > work, meaning the pp_alloc_cache have enough room.
> > It is definitely on purpose that pp_alloc_cache have 128 elements and is
> > only refill'ed with 64 elements, which leaves room for this kind of
> > trick. But if Eric's deferred skb freeing have more than 64 pages to
> > free, then we will likely fallback to ptr_ring recycling.
> >
> > Code wise, I suggest that you/we change page_pool_put_page_bulk() to
> > have a variant that 'allow_direct' (looking at code below, you might
> > already do this as this patch over-steer 'allow_direct'). Using the
> > bulk API, would then bulk into ptr_ring in the cases we cannot use
> > direct cache.
>
> Interesting point, let me re-run some tests with the statistics enabled.
> For a simple stream test I think it may just be too steady to trigger
> over/underflow. Each skb will carry at most 18 pages, and driver should
> only produce 64 packets / consume 64 pages. Each NAPI cycle will start
> by flushing the deferred free. So unless there is a hiccup either at the
> app or NAPI side - the flows of pages in each direction should be steady
> enough to do well with just 128 cache entries. Let me get the data and
> report back.
I patched the driver a bit to use page pool for HW-GRO.
Below are recycle stats with HW-GRO and with SW GRO + XDP_PASS (packet per page).
HW-GRO page=page
before after before after
recycle:
cached: 0 138669686 0 150197505
cache_full: 0 223391 0 74582
ring: 138551933 9997191 149299454 0
ring_full: 0 488 3154 127590
released_refcnt: 0 0 0 0
alloc:
fast: 136491361 148615710 146969587 150322859
slow: 1772 1799 144 105
slow_high_order: 0 0 0 0
empty: 1772 1799 144 105
refill: 2165245 156302 2332880 2128
waive: 0 0 0 0
Which seems to confirm that for this trivial test the cache sizing is
good enough, and I won't see any benefit from batching (as cache is only
full respectively 0.16% and 0.05% of the time).
Powered by blists - more mailing lists