lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 31 Mar 2018 00:11:10 +0000
From:   Saeed Mahameed <>
To:     "" <>,
        "" <>,
        "" <>,
        "" <>
CC:     Gal Pressman <>,
        "" <>,
        Tariq Toukan <>,
        "" <>,
        Eran Ben Elisha <>,
        "" <>,
        Eugenia Emantayev <>,
        "" <>
Subject: Re: [net-next V7 PATCH 14/16] mlx5: use page_pool for
 xdp_return_frame call

On Thu, 2018-03-29 at 19:02 +0200, Jesper Dangaard Brouer wrote:
> This patch shows how it is possible to have both the driver local
> page
> cache, which uses elevated refcnt for "catching"/avoiding SKB
> put_page returns the page through the page allocator.  And at the
> same time, have pages getting returned to the page_pool from
> ndp_xdp_xmit DMA completion.
> The performance improvement for XDP_REDIRECT in this patch is really
> good.  Especially considering that (currently) the xdp_return_frame
> API and page_pool_put_page() does per frame operations of both
> rhashtable ID-lookup and locked return into (page_pool) ptr_ring.
> (It is the plan to remove these per frame operation in a followup
> patchset).
> The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe,
> with xdp_redirect_map (using devmap) . And the target/maximum
> capability of ixgbe is 13Mpps (on this HW setup).
> Before this patch for mlx5, XDP redirected frames were returned via
> the page allocator.  The single flow performance was 6Mpps, and if I
> started two flows the collective performance drop to 4Mpps, because
> we
> hit the page allocator lock (further negative scaling occurs).
> Two test scenarios need to be covered, for xdp_return_frame API,
> which
> is DMA-TX completion running on same-CPU or cross-CPU free/return.
> Results were same-CPU=10Mpps, and cross-CPU=12Mpps.  This is very
> close to our 13Mpps max target.
> The reason max target isn't reached in cross-CPU test, is likely due
> to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to
> ixgbe testing).  It is also planned to remove this unnecessary DMA
> unmap in a later patchset
> V2: Adjustments requested by Tariq
>  - Changed page_pool_create return codes not return NULL, only
>    ERR_PTR, as this simplifies err handling in drivers.
>  - Save a branch in mlx5e_page_release
>  - Correct page_pool size calc for
> V5: Updated patch desc
> Signed-off-by: Jesper Dangaard Brouer <>
> Reviewed-by: Tariq Toukan <>

Acked-by: Saeed Mahameed <>

Powered by blists - more mailing lists