[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180331141155.3e781694@redhat.com>
Date: Sat, 31 Mar 2018 14:11:55 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: netdev@...r.kernel.org,
BjörnTöpel <bjorn.topel@...el.com>,
magnus.karlsson@...el.com
Cc: eugenia@...lanox.com, Jason Wang <jasowang@...hat.com>,
John Fastabend <john.fastabend@...il.com>,
Eran Ben Elisha <eranbe@...lanox.com>,
Saeed Mahameed <saeedm@...lanox.com>, galp@...lanox.com,
Daniel Borkmann <borkmann@...earbox.net>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Tariq Toukan <tariqt@...lanox.com>, brouer@...hat.com
Subject: Re: [net-next V8 PATCH 14/16] mlx5: use page_pool for
xdp_return_frame call
On Sat, 31 Mar 2018 14:06:57 +0200 Jesper Dangaard Brouer <brouer@...hat.com> wrote:
> This patch shows how it is possible to have both the driver local page
> cache, which uses elevated refcnt for "catching"/avoiding SKB
> put_page returns the page through the page allocator. And at the
> same time, have pages getting returned to the page_pool from
> ndp_xdp_xmit DMA completion.
>
> The performance improvement for XDP_REDIRECT in this patch is really
> good. Especially considering that (currently) the xdp_return_frame
> API and page_pool_put_page() does per frame operations of both
> rhashtable ID-lookup and locked return into (page_pool) ptr_ring.
> (It is the plan to remove these per frame operation in a followup
> patchset).
>
> The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe,
> with xdp_redirect_map (using devmap) . And the target/maximum
> capability of ixgbe is 13Mpps (on this HW setup).
>
> Before this patch for mlx5, XDP redirected frames were returned via
> the page allocator. The single flow performance was 6Mpps, and if I
> started two flows the collective performance drop to 4Mpps, because we
> hit the page allocator lock (further negative scaling occurs).
>
> Two test scenarios need to be covered, for xdp_return_frame API, which
> is DMA-TX completion running on same-CPU or cross-CPU free/return.
> Results were same-CPU=10Mpps, and cross-CPU=12Mpps. This is very
> close to our 13Mpps max target.
>
> The reason max target isn't reached in cross-CPU test, is likely due
> to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to
> ixgbe testing). It is also planned to remove this unnecessary DMA
> unmap in a later patchset
>
> V2: Adjustments requested by Tariq
> - Changed page_pool_create return codes not return NULL, only
> ERR_PTR, as this simplifies err handling in drivers.
> - Save a branch in mlx5e_page_release
> - Correct page_pool size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ
>
> V5: Updated patch desc
>
> V8: Adjust for b0cedc844c00 ("net/mlx5e: Remove rq_headroom field from params")
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> Reviewed-by: Tariq Toukan <tariqt@...lanox.com>
Forgot to add Saeed's previous ACK from V7
Acked-by: Saeed Mahameed <saeedm@...lanox.com>
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists