lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 29 Sep 2022 20:55:20 +0200 From: Jesper Dangaard Brouer <jbrouer@...hat.com> To: Shenwei Wang <shenwei.wang@....com>, Jesper Dangaard Brouer <jbrouer@...hat.com>, Andrew Lunn <andrew@...n.ch> Cc: brouer@...hat.com, Joakim Zhang <qiangqing.zhang@....com>, "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, Jesper Dangaard Brouer <hawk@...nel.org>, John Fastabend <john.fastabend@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "imx@...ts.linux.dev" <imx@...ts.linux.dev> Subject: Re: [EXT] Re: [PATCH 1/1] net: fec: add initial XDP support On 29/09/2022 17.52, Shenwei Wang wrote: > >> From: Jesper Dangaard Brouer <jbrouer@...hat.com> >> >> On 29/09/2022 15.26, Shenwei Wang wrote: >>> >>>> From: Andrew Lunn <andrew@...n.ch> >>>> Sent: Thursday, September 29, 2022 8:23 AM >> [...] >>>> >>>>> I actually did some compare testing regarding the page pool for >>>>> normal traffic. So far I don't see significant improvement in the >>>>> current implementation. The performance for large packets improves a >>>>> little, and the performance for small packets get a little worse. >>>> >>>> What hardware was this for? imx51? imx6? imx7 Vybrid? These all use the FEC. >>> >>> I tested on imx8qxp platform. It is ARM64. >> >> On mvneta driver/platform we saw huge speedup replacing: >> >> page_pool_release_page(rxq->page_pool, page); with >> skb_mark_for_recycle(skb); >> >> As I mentioned: Today page_pool have SKB recycle support (you might have >> looked at drivers that didn't utilize this yet), thus you don't need to release the >> page (page_pool_release_page) here. Instead you could simply mark the SKB >> for recycling, unless driver does some page refcnt tricks I didn't notice. >> >> On the mvneta driver/platform the DMA unmap (in page_pool_release_page) >> was very expensive. This imx8qxp platform might have faster DMA unmap in >> case is it cache-coherent. >> >> I would be very interested in knowing if skb_mark_for_recycle() helps on this >> platform, for normal network stack performance. >> > > Did a quick compare testing for the following 3 scenarios: Thanks for doing this! :-) > 1. original implementation > > shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 49154 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 104 MBytes 868 Mbits/sec > [ 1] 1.0000-2.0000 sec 105 MBytes 878 Mbits/sec > [ 1] 2.0000-3.0000 sec 105 MBytes 881 Mbits/sec > [ 1] 3.0000-4.0000 sec 105 MBytes 879 Mbits/sec > [ 1] 4.0000-5.0000 sec 105 MBytes 878 Mbits/sec > [ 1] 5.0000-6.0000 sec 105 MBytes 878 Mbits/sec > [ 1] 6.0000-7.0000 sec 104 MBytes 875 Mbits/sec > [ 1] 7.0000-8.0000 sec 104 MBytes 875 Mbits/sec > [ 1] 8.0000-9.0000 sec 104 MBytes 873 Mbits/sec > [ 1] 9.0000-10.0000 sec 104 MBytes 875 Mbits/sec > [ 1] 0.0000-10.0073 sec 1.02 GBytes 875 Mbits/sec > > 2. Page pool with page_pool_release_page > > shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 35924 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 101 MBytes 849 Mbits/sec > [ 1] 1.0000-2.0000 sec 102 MBytes 860 Mbits/sec > [ 1] 2.0000-3.0000 sec 102 MBytes 860 Mbits/sec > [ 1] 3.0000-4.0000 sec 102 MBytes 859 Mbits/sec > [ 1] 4.0000-5.0000 sec 103 MBytes 863 Mbits/sec > [ 1] 5.0000-6.0000 sec 103 MBytes 864 Mbits/sec > [ 1] 6.0000-7.0000 sec 103 MBytes 863 Mbits/sec > [ 1] 7.0000-8.0000 sec 103 MBytes 865 Mbits/sec > [ 1] 8.0000-9.0000 sec 103 MBytes 862 Mbits/sec > [ 1] 9.0000-10.0000 sec 102 MBytes 856 Mbits/sec > [ 1] 0.0000-10.0246 sec 1.00 GBytes 858 Mbits/sec > > > 3. page pool with skb_mark_for_recycle > > shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 42724 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 111 MBytes 931 Mbits/sec > [ 1] 1.0000-2.0000 sec 112 MBytes 935 Mbits/sec > [ 1] 2.0000-3.0000 sec 111 MBytes 934 Mbits/sec > [ 1] 3.0000-4.0000 sec 111 MBytes 934 Mbits/sec > [ 1] 4.0000-5.0000 sec 111 MBytes 934 Mbits/sec > [ 1] 5.0000-6.0000 sec 112 MBytes 935 Mbits/sec > [ 1] 6.0000-7.0000 sec 111 MBytes 934 Mbits/sec > [ 1] 7.0000-8.0000 sec 111 MBytes 933 Mbits/sec > [ 1] 8.0000-9.0000 sec 112 MBytes 935 Mbits/sec > [ 1] 9.0000-10.0000 sec 111 MBytes 933 Mbits/sec > [ 1] 0.0000-10.0069 sec 1.09 GBytes 934 Mbits/sec This is a very significant performance improvement (page pool with skb_mark_for_recycle). This is very close to the max goodput for a 1Gbit/s link. > For small packet size (64 bytes), all three cases have almost the same result: > To me this indicate, that the DMA map/unmap operations on this platform are indeed more expensive on larger packets. Given this is what page_pool does, keeping the DMA mapping intact when recycling. Driver still need DMA-sync, although I notice you set page_pool feature flag PP_FLAG_DMA_SYNC_DEV, this is good as page_pool will try to reduce sync size where possible. E.g. in this SKB case will reduce the DMA-sync to the max_len=FEC_ENET_RX_FRSIZE which should also help on performance. > shenwei@...0:~$ iperf -c 10.81.16.245 -w 2m -i 1 -l 64 > ------------------------------------------------------------ > Client connecting to 10.81.16.245, TCP port 5001 > TCP window size: 416 KByte (WARNING: requested 1.91 MByte) > ------------------------------------------------------------ > [ 1] local 10.81.17.20 port 58204 connected with 10.81.16.245 port 5001 > [ ID] Interval Transfer Bandwidth > [ 1] 0.0000-1.0000 sec 36.9 MBytes 309 Mbits/sec > [ 1] 1.0000-2.0000 sec 36.6 MBytes 307 Mbits/sec > [ 1] 2.0000-3.0000 sec 36.6 MBytes 307 Mbits/sec > [ 1] 3.0000-4.0000 sec 36.5 MBytes 307 Mbits/sec > [ 1] 4.0000-5.0000 sec 37.1 MBytes 311 Mbits/sec > [ 1] 5.0000-6.0000 sec 37.2 MBytes 312 Mbits/sec > [ 1] 6.0000-7.0000 sec 37.1 MBytes 311 Mbits/sec > [ 1] 7.0000-8.0000 sec 37.1 MBytes 311 Mbits/sec > [ 1] 8.0000-9.0000 sec 37.1 MBytes 312 Mbits/sec > [ 1] 9.0000-10.0000 sec 37.2 MBytes 312 Mbits/sec > [ 1] 0.0000-10.0097 sec 369 MBytes 310 Mbits/sec > > Regards, > Shenwei > > >>>> By small packets, do you mean those under the copybreak limit? >>>> >>>> Please provide some benchmark numbers with your next patchset. >>> >>> Yes, the packet size is 64 bytes and it is under the copybreak limit. >>> As the impact is not significant, I would prefer to remove the >>> copybreak logic. >> >> +1 to removing this logic if possible, due to maintenance cost. >> >> --Jesper >
Powered by blists - more mailing lists