lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210128150644.78b981cb@carbon>
Date:   Thu, 28 Jan 2021 15:06:44 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Lorenzo Bianconi <lorenzo@...nel.org>
Cc:     bpf@...r.kernel.org, netdev@...r.kernel.org, davem@...emloft.net,
        kuba@...nel.org, ast@...nel.org, daniel@...earbox.net,
        toshiaki.makita1@...il.com, lorenzo.bianconi@...hat.com,
        toke@...hat.com, brouer@...hat.com
Subject: Re: [PATCH bpf-next 1/3] net: veth: introduce bulking for XDP_PASS

On Tue, 26 Jan 2021 19:41:59 +0100
Lorenzo Bianconi <lorenzo@...nel.org> wrote:

> Introduce bulking support for XDP_PASS verdict forwarding skbs to
> the networking stack
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@...nel.org>
> ---
>  drivers/net/veth.c | 43 ++++++++++++++++++++++++++-----------------
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 6e03b619c93c..23137d9966da 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -35,6 +35,7 @@
>  #define VETH_XDP_HEADROOM	(XDP_PACKET_HEADROOM + NET_IP_ALIGN)
>  
>  #define VETH_XDP_TX_BULK_SIZE	16
> +#define VETH_XDP_BATCH		8
>

I suspect that VETH_XDP_BATCH = 8 is not the optimal value.

You have taken this value from CPUMAP code, which cannot be generalized
to this case.  The optimal value for CPUMAP is actually to bulk dequeue
16 frames from ptr_ring, but there is a prefetch in one of the loops,
which should not be larger than 10, due to the Intel Line-Fill-Buffer
cannot have more than 10 out-standing prefetch instructions in flight.
(Yes, I measured this[1] with perf stat, when coding that)

Could you please test with 16, to see if results are better?

In this veth case, we will likely be started on the same CPU that
received the xdp_frames.  Thus, things are likely hot in cache, and we
don't have to care so much about moving cachelines across CPUs.  So, I
don't expect it will make much difference.


[1] https://github.com/xdp-project/xdp-project/blob/master/areas/cpumap/cpumap02-optimizations.org
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ