[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160908093833.58101878@redhat.com>
Date: Thu, 8 Sep 2016 09:38:33 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Or Gerlitz <gerlitz.or@...il.com>
Cc: Saeed Mahameed <saeedm@...lanox.com>,
iovisor-dev <iovisor-dev@...ts.iovisor.org>,
Linux Netdev List <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...lanox.com>,
Brenden Blanco <bblanco@...mgrid.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Tom Herbert <tom@...bertland.com>,
Martin KaFai Lau <kafai@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Eric Dumazet <edumazet@...gle.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Rana Shahout <ranas@...lanox.com>, brouer@...hat.com
Subject: Re: [PATCH RFC 08/11] net/mlx5e: XDP fast RX drop bpf programs
support
On Wed, 7 Sep 2016 23:55:42 +0300
Or Gerlitz <gerlitz.or@...il.com> wrote:
> On Wed, Sep 7, 2016 at 3:42 PM, Saeed Mahameed <saeedm@...lanox.com> wrote:
> > From: Rana Shahout <ranas@...lanox.com>
> >
> > Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.
> >
> > When XDP is on we make sure to change channels RQs type to
> > MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to
> > ensure "page per packet".
> >
> > On XDP set, we fail if HW LRO is set and request from user to turn it
> > off. Since on ConnectX4-LX HW LRO is always on by default, this will be
> > annoying, but we prefer not to enforce LRO off from XDP set function.
> >
> > Full channels reset (close/open) is required only when setting XDP
> > on/off.
> >
> > When XDP set is called just to exchange programs, we will update
> > each RQ xdp program on the fly and for synchronization with current
> > data path RX activity of that RQ, we temporally disable that RQ and
> > ensure RX path is not running, quickly update and re-enable that RQ,
> > for that we do:
> > - rq.state = disabled
> > - napi_synnchronize
> > - xchg(rq->xdp_prg)
> > - rq.state = enabled
> > - napi_schedule // Just in case we've missed an IRQ
> >
> > Packet rate performance testing was done with pktgen 64B packets and on
> > TX side and, TC drop action on RX side compared to XDP fast drop.
> >
> > CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
> >
> > Comparison is done between:
> > 1. Baseline, Before this patch with TC drop action
> > 2. This patch with TC drop action
> > 3. This patch with XDP RX fast drop
> >
> > Streams Baseline(TC drop) TC drop XDP fast Drop
> > --------------------------------------------------------------
> > 1 5.51Mpps 5.14Mpps 13.5Mpps
>
> This (13.5 M PPS) is less than 50% of the result we presented @ the
> XDP summit which was obtained by Rana. Please see if/how much does
> this grows if you use more sender threads, but all of them to xmit the
> same stream/flows, so we're on one ring. That (XDP with single RX ring
> getting packets from N remote TX rings) would be your canonical
> base-line for any further numbers.
Well, my experiments with this hardware (mlx5/CX4 at 50Gbit/s) show
that you should be able to reach 23Mpps on a single CPU. This is
a XDP-drop-simulation with order-0 pages being recycled through my
page_pool code, plus avoiding the cache-misses (notice you are using a
CPU E5-2680 with DDIO, thus you should only see a L3 cache miss).
The 23Mpps number looks like some HW limitation, as the increase was
is not proportional to page-allocator overhead I removed (and CPU freq
starts to decrease). I also did scaling tests to more CPUs, which
showed it scaled up to 40Mpps (you reported 45M). And at the Phy RX
level I see 60Mpps (50G max is 74Mpps).
Notice this is a significant improvement over the mlx4/CX3-pro HW, as
it only scales up to 20Mpps, but can also do 20Mpps XDP-drop on a
single core.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists