lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 31 Mar 2018 00:11:10 +0000 From: Saeed Mahameed <saeedm@...lanox.com> To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "bjorn.topel@...el.com" <bjorn.topel@...el.com>, "magnus.karlsson@...el.com" <magnus.karlsson@...el.com>, "brouer@...hat.com" <brouer@...hat.com> CC: Gal Pressman <galp@...lanox.com>, "borkmann@...earbox.net" <borkmann@...earbox.net>, Tariq Toukan <tariqt@...lanox.com>, "john.fastabend@...il.com" <john.fastabend@...il.com>, Eran Ben Elisha <eranbe@...lanox.com>, "alexei.starovoitov@...il.com" <alexei.starovoitov@...il.com>, Eugenia Emantayev <eugenia@...lanox.com>, "jasowang@...hat.com" <jasowang@...hat.com> Subject: Re: [net-next V7 PATCH 14/16] mlx5: use page_pool for xdp_return_frame call On Thu, 2018-03-29 at 19:02 +0200, Jesper Dangaard Brouer wrote: > This patch shows how it is possible to have both the driver local > page > cache, which uses elevated refcnt for "catching"/avoiding SKB > put_page returns the page through the page allocator. And at the > same time, have pages getting returned to the page_pool from > ndp_xdp_xmit DMA completion. > > The performance improvement for XDP_REDIRECT in this patch is really > good. Especially considering that (currently) the xdp_return_frame > API and page_pool_put_page() does per frame operations of both > rhashtable ID-lookup and locked return into (page_pool) ptr_ring. > (It is the plan to remove these per frame operation in a followup > patchset). > > The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe, > with xdp_redirect_map (using devmap) . And the target/maximum > capability of ixgbe is 13Mpps (on this HW setup). > > Before this patch for mlx5, XDP redirected frames were returned via > the page allocator. The single flow performance was 6Mpps, and if I > started two flows the collective performance drop to 4Mpps, because > we > hit the page allocator lock (further negative scaling occurs). > > Two test scenarios need to be covered, for xdp_return_frame API, > which > is DMA-TX completion running on same-CPU or cross-CPU free/return. > Results were same-CPU=10Mpps, and cross-CPU=12Mpps. This is very > close to our 13Mpps max target. > > The reason max target isn't reached in cross-CPU test, is likely due > to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to > ixgbe testing). It is also planned to remove this unnecessary DMA > unmap in a later patchset > > V2: Adjustments requested by Tariq > - Changed page_pool_create return codes not return NULL, only > ERR_PTR, as this simplifies err handling in drivers. > - Save a branch in mlx5e_page_release > - Correct page_pool size calc for > MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ > > V5: Updated patch desc > > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com> > Reviewed-by: Tariq Toukan <tariqt@...lanox.com> Acked-by: Saeed Mahameed <saeedm@...lanox.com>
Powered by blists - more mailing lists