[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AS8PR04MB9176281109667B36CB694763895A9@AS8PR04MB9176.eurprd04.prod.outlook.com>
Date: Tue, 4 Oct 2022 13:12:40 +0000
From: Shenwei Wang <shenwei.wang@....com>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>,
Andrew Lunn <andrew@...n.ch>
CC: "brouer@...hat.com" <brouer@...hat.com>,
Joakim Zhang <qiangqing.zhang@....com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"imx@...ts.linux.dev" <imx@...ts.linux.dev>,
Magnus Karlsson <magnus.karlsson@...il.com>,
Björn Töpel <bjorn@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>
Subject: RE: [EXT] Re: [PATCH 1/1] net: fec: add initial XDP support
> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@...hat.com>
> Sent: Tuesday, October 4, 2022 6:22 AM
> To: Shenwei Wang <shenwei.wang@....com>; Jesper Dangaard Brouer
> <jbrouer@...hat.com>; Andrew Lunn <andrew@...n.ch>
> Cc: brouer@...hat.com; Joakim Zhang <qiangqing.zhang@....com>; David S.
> Miller <davem@...emloft.net>; Eric Dumazet <edumazet@...gle.com>; Jakub
> Kicinski <kuba@...nel.org>; Paolo Abeni <pabeni@...hat.com>; Alexei
> Starovoitov <ast@...nel.org>; Daniel Borkmann <daniel@...earbox.net>;
> Jesper Dangaard Brouer <hawk@...nel.org>; John Fastabend
> <john.fastabend@...il.com>; netdev@...r.kernel.org; linux-
> kernel@...r.kernel.org; imx@...ts.linux.dev; Magnus Karlsson
> <magnus.karlsson@...il.com>; Björn Töpel <bjorn@...nel.org>; Ilias
> Apalodimas <ilias.apalodimas@...aro.org>
> Subject: Re: [EXT] Re: [PATCH 1/1] net: fec: add initial XDP support
>
> Caution: EXT Email
>
> On 03/10/2022 14.49, Shenwei Wang wrote:
> > Hi Jesper,
> >
> >>>> On mvneta driver/platform we saw huge speedup replacing:
> >>>>
> >>>> page_pool_release_page(rxq->page_pool, page); with
> >>>> skb_mark_for_recycle(skb);
> >>>>
> >
> > After replacing the page_pool_release_page with the
> > skb_mark_for_recycle, I found something confused me a little in the
> > testing result. > I tested with the sample app of "xdpsock" under two
> > modes:
> > 1. Native (xdpsock -i eth0).
> > 2. Skb-mode (xdpsock -S -i eth0).
> Great that you are also testing AF_XDP, but do you have a particular use-case
> that needs AF_XDP on this board?
The purpose is to provide our customers an alternative solution to the current DPDK implementation.
>
> What packet size are used in below results?
The packets were generated by pktgen_sample03_burst_single_flow.sh, and its size is 60 bytes 0 frags.
>
> > The following are the testing result:
> >
> > With page_pool_release_page (pps) With skb_mark_for_recycle
> > (pps)
> >
> > SKB-Mode 90K 200K
> > Native 190K 190K
> >
>
> The default AF_XDP test with xdpsock is rxdrop IIRC.
>
> Can you test the normal XDP code path and do a XDP_DROP test via the samples
> tool 'xdp_rxq_info' and cmdline:
>
> sudo ./xdp_rxq_info --dev eth42 --act XDP_DROP --read
>
> And then same with --skb-mode
I haven't tested xdp_rxq_info yet, and will have a try sometime later today. However, for the XDP_DROP test, I
did try xdp2 test case, and the testing result looks reasonable. The performance of Native mode is much higher than skb-mode.
# xdp2 eth0
proto 0: 475362 pkt/s
# xdp2 -S eth0 (page_pool_release_page solution)
proto 17: 71999 pkt/s
# xdp2 -S eth0 (skb_mark_for_recycle solution)
proto 17: 72228 pkt/s
>
> > The skb_mark_for_recycle solution boosted the performance of SKB-Mode
> > to 200K+ PPS. That is even higher than the performance of Native
> > solution. Is this result reasonable? Do you have any clue why the
> > SKB-Mode performance can go higher than that of Native one?
> I might be able to explain this (Cc. AF_XDP maintainers to keep me honest).
>
> When you say "native" *AF_XDP* that isn't Zero-Copy AF_XDP.
>
Right. Zero-copy hasn't been implemented yet.
> Sure, XDP runs in native driver mode and redirects the raw frames into the
> AF_XDP socket, but as this isn't zero-copy AF_XDP. Thus, the packets needs to be
> copied into the AF_XDP buffers.
>
> As soon as the frame or SKB (for generic XDP) have been copied it is
> released/freed by AF_XDP/xsk code (either via xdp_return_buff() or
> consume_skb()). Thus, it looks like it really pays off to recycle the frame via
> page_pool, also for the SKB consume_skb() case.
>
> I am still a little surprised that to can be faster than native AF_XDP, as the SKB-
> mode ("XDP-generic") needs to call through lot more software layers and
> convert the SKB to look like an xdp_buff.
That's what I can't understand right now too.
Thanks very much for the explanations!
Shenwei
>
> --Jesper
>
>
>
> >> -----Original Message-----
> >> From: Jesper Dangaard Brouer <jbrouer@...hat.com>
> >> Sent: Thursday, September 29, 2022 1:55 PM
> >> To: Shenwei Wang <shenwei.wang@....com>; Jesper Dangaard Brouer
> >> <jbrouer@...hat.com>; Andrew Lunn <andrew@...n.ch>
> >> Cc: brouer@...hat.com; Joakim Zhang <qiangqing.zhang@....com>; David
> S.
> >> Miller <davem@...emloft.net>; Eric Dumazet <edumazet@...gle.com>;
> >> Jakub Kicinski <kuba@...nel.org>; Paolo Abeni <pabeni@...hat.com>;
> >> Alexei Starovoitov <ast@...nel.org>; Daniel Borkmann
> >> <daniel@...earbox.net>; Jesper Dangaard Brouer <hawk@...nel.org>;
> >> John Fastabend <john.fastabend@...il.com>; netdev@...r.kernel.org;
> >> linux- kernel@...r.kernel.org; imx@...ts.linux.dev
> >> Subject: Re: [EXT] Re: [PATCH 1/1] net: fec: add initial XDP support
> >>
> >> Caution: EXT Email
> >>
> >> On 29/09/2022 17.52, Shenwei Wang wrote:
> >>>
> >>>> From: Jesper Dangaard Brouer <jbrouer@...hat.com>
> >>>>
> >>>> On 29/09/2022 15.26, Shenwei Wang wrote:
> >>>>>
> >>>>>> From: Andrew Lunn <andrew@...n.ch>
> >>>>>> Sent: Thursday, September 29, 2022 8:23 AM
> >>>> [...]
> >>>>>>
> >>>>>>> I actually did some compare testing regarding the page pool for
> >>>>>>> normal traffic. So far I don't see significant improvement in
> >>>>>>> the current implementation. The performance for large packets
> >>>>>>> improves a little, and the performance for small packets get a little
> worse.
> >>>>>>
> >>>>>> What hardware was this for? imx51? imx6? imx7 Vybrid? These all
> >>>>>> use the
> >> FEC.
> >>>>>
> >>>>> I tested on imx8qxp platform. It is ARM64.
> >>>>
> >>>> On mvneta driver/platform we saw huge speedup replacing:
> >>>>
> >>>> page_pool_release_page(rxq->page_pool, page); with
> >>>> skb_mark_for_recycle(skb);
> >>>>
> >>>> As I mentioned: Today page_pool have SKB recycle support (you might
> >>>> have looked at drivers that didn't utilize this yet), thus you
> >>>> don't need to release the page (page_pool_release_page) here.
> >>>> Instead you could simply mark the SKB for recycling, unless driver
> >>>> does some page refcnt
> >> tricks I didn't notice.
> >>>>
> >>>> On the mvneta driver/platform the DMA unmap (in
> >>>> page_pool_release_page) was very expensive. This imx8qxp platform
> >>>> might have faster DMA unmap in case is it cache-coherent.
> >>>>
> >>>> I would be very interested in knowing if skb_mark_for_recycle()
> >>>> helps on this platform, for normal network stack performance.
> >>>>
> >>>
> >>> Did a quick compare testing for the following 3 scenarios:
> >>
> >> Thanks for doing this! :-)
> >>
> >>> 1. original implementation
> >>>
> >>> shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1
> >>> ------------------------------------------------------------
> >>> Client connecting to 10.81.16.245, TCP port 5001 TCP window size:
> >>> 416 KByte (WARNING: requested 1.91 MByte)
> >>> ------------------------------------------------------------
> >>> [ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001
> >>> [ ID] Interval Transfer Bandwidth
> >>> [ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec
> >>> [ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec
> >>> [ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec
> >>> [ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec
> >>> [ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec
> >>> [ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec
> >>> [ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec
> >>> [ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec
> >>> [ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec
> >>> [ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec
> >>> [ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec
> >>>
> >>> 2. Page pool with page_pool_release_page
> >>>
> >>> shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1
> >>> ------------------------------------------------------------
> >>> Client connecting to 10.81.16.245, TCP port 5001 TCP window size:
> >>> 416 KByte (WARNING: requested 1.91 MByte)
> >>> ------------------------------------------------------------
> >>> [ 1] local 10.81.17.20 port 35924 connected with 10.81.16.245 port 5001
> >>> [ ID] Interval Transfer Bandwidth
> >>> [ 1] 0.0000-1.0000 sec 101 MBytes 849 Mbits/sec
> >>> [ 1] 1.0000-2.0000 sec 102 MBytes 860 Mbits/sec
> >>> [ 1] 2.0000-3.0000 sec 102 MBytes 860 Mbits/sec
> >>> [ 1] 3.0000-4.0000 sec 102 MBytes 859 Mbits/sec
> >>> [ 1] 4.0000-5.0000 sec 103 MBytes 863 Mbits/sec
> >>> [ 1] 5.0000-6.0000 sec 103 MBytes 864 Mbits/sec
> >>> [ 1] 6.0000-7.0000 sec 103 MBytes 863 Mbits/sec
> >>> [ 1] 7.0000-8.0000 sec 103 MBytes 865 Mbits/sec
> >>> [ 1] 8.0000-9.0000 sec 103 MBytes 862 Mbits/sec
> >>> [ 1] 9.0000-10.0000 sec 102 MBytes 856 Mbits/sec
> >>> [ 1] 0.0000-10.0246 sec 1.00 GBytes 858 Mbits/sec
> >>>
> >>>
> >>> 3. page pool with skb_mark_for_recycle
> >>>
> >>> shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1
> >>> ------------------------------------------------------------
> >>> Client connecting to 10.81.16.245, TCP port 5001 TCP window size:
> >>> 416 KByte (WARNING: requested 1.91 MByte)
> >>> ------------------------------------------------------------
> >>> [ 1] local 10.81.17.20 port 42724 connected with 10.81.16.245 port 5001
> >>> [ ID] Interval Transfer Bandwidth
> >>> [ 1] 0.0000-1.0000 sec 111 MBytes 931 Mbits/sec
> >>> [ 1] 1.0000-2.0000 sec 112 MBytes 935 Mbits/sec
> >>> [ 1] 2.0000-3.0000 sec 111 MBytes 934 Mbits/sec
> >>> [ 1] 3.0000-4.0000 sec 111 MBytes 934 Mbits/sec
> >>> [ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec
> >>> [ 1] 5.0000-6.0000 sec 112 MBytes 935 Mbits/sec
> >>> [ 1] 6.0000-7.0000 sec 111 MBytes 934 Mbits/sec
> >>> [ 1] 7.0000-8.0000 sec 111 MBytes 933 Mbits/sec
> >>> [ 1] 8.0000-9.0000 sec 112 MBytes 935 Mbits/sec
> >>> [ 1] 9.0000-10.0000 sec 111 MBytes 933 Mbits/sec
> >>> [ 1] 0.0000-10.0069 sec 1.09 GBytes 934 Mbits/sec
> >>
> >> This is a very significant performance improvement (page pool with
> >> skb_mark_for_recycle). This is very close to the max goodput for a 1Gbit/s
> link.
> >>
> >>
> >>> For small packet size (64 bytes), all three cases have almost the same result:
> >>>
> >>
> >> To me this indicate, that the DMA map/unmap operations on this
> >> platform are indeed more expensive on larger packets. Given this is
> >> what page_pool does, keeping the DMA mapping intact when recycling.
> >>
> >> Driver still need DMA-sync, although I notice you set page_pool
> >> feature flag PP_FLAG_DMA_SYNC_DEV, this is good as page_pool will try
> >> to reduce sync size where possible. E.g. in this SKB case will reduce
> >> the DMA-sync to the max_len=FEC_ENET_RX_FRSIZE which should also help
> on performance.
> >>
> >>
> >>> shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64
> >>> ------------------------------------------------------------
> >>> Client connecting to 10.81.16.245, TCP port 5001 TCP window size:
> >>> 416 KByte (WARNING: requested 1.91 MByte)
> >>> ------------------------------------------------------------
> >>> [ 1] local 10.81.17.20 port 58204 connected with 10.81.16.245 port 5001
> >>> [ ID] Interval Transfer Bandwidth
> >>> [ 1] 0.0000-1.0000 sec 36.9 MBytes 309 Mbits/sec
> >>> [ 1] 1.0000-2.0000 sec 36.6 MBytes 307 Mbits/sec
> >>> [ 1] 2.0000-3.0000 sec 36.6 MBytes 307 Mbits/sec
> >>> [ 1] 3.0000-4.0000 sec 36.5 MBytes 307 Mbits/sec
> >>> [ 1] 4.0000-5.0000 sec 37.1 MBytes 311 Mbits/sec
> >>> [ 1] 5.0000-6.0000 sec 37.2 MBytes 312 Mbits/sec
> >>> [ 1] 6.0000-7.0000 sec 37.1 MBytes 311 Mbits/sec
> >>> [ 1] 7.0000-8.0000 sec 37.1 MBytes 311 Mbits/sec
> >>> [ 1] 8.0000-9.0000 sec 37.1 MBytes 312 Mbits/sec
> >>> [ 1] 9.0000-10.0000 sec 37.2 MBytes 312 Mbits/sec
> >>> [ 1] 0.0000-10.0097 sec 369 MBytes 310 Mbits/sec
> >>>
> >>> Regards,
> >>> Shenwei
> >>>
> >>>
> >>>>>> By small packets, do you mean those under the copybreak limit?
> >>>>>>
> >>>>>> Please provide some benchmark numbers with your next patchset.
> >>>>>
> >>>>> Yes, the packet size is 64 bytes and it is under the copybreak limit.
> >>>>> As the impact is not significant, I would prefer to remove the
> >>>>> copybreak logic.
> >>>>
> >>>> +1 to removing this logic if possible, due to maintenance cost.
> >>>>
> >>>> --Jesper
> >>>
> >
Powered by blists - more mailing lists