lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 9 Aug 2023 06:22:11 +0000
From:   Wei Fang <wei.fang@....com>
To:     Jesper Dangaard Brouer <hawk@...nel.org>,
        Jesper Dangaard Brouer <jbrouer@...hat.com>,
        Jakub Kicinski <kuba@...nel.org>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        Shenwei Wang <shenwei.wang@....com>,
        Clark Wang <xiaoning.wang@....com>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        dl-linux-imx <linux-imx@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        Andrew Lunn <andrew@...n.ch>
Subject: RE: [PATCH V3 net-next] net: fec: add XDP_TX feature support

> > Thanks very much!
> > You remind me, I always started the pktgen script first and then ran
> > the xdp2 program in the previous tests. So I saw the transmit speed of
> > the generator was always greater than the speed of XDP_TX when I
> > stopped the script. But actually, the real-time transmit speed of the
> > generator was degraded to as equal to the speed of XDP_TX.
> >
> 
> Good that we finally found the root-cause, that explains why it seems our
> code changes didn't have any effect.  The generator gets affected and
> slowed down due to the traffic that is bounced back to it. (I tried to hint this
> earlier with the Ethernet Flow-Control settings).
> 
> > So I turned off the rx function of the generator in case of increasing
> > the CPU loading of the generator due to the returned traffic from xdp2.
> 
> How did you turned off the rx function of the generator?
> (I a couple of tricks I use)
> 
Actually, I didn't really disable the rx function of the generator, I just made
the generator hardware automatically discard the returned traffic from xdp2.
So I utilized the MAC filter feature of the hardware and did some modification
to the pktgen script to make the SMAC of the packet is different from the MAC
address of the generator.


> > And I tested
> > the performance again. Below are the results.
> >
> > Result 1: current method
> > root@...8mpevk:~# ./xdp2 eth0
> > proto 17:     326539 pkt/s
> > proto 17:     326464 pkt/s
> > proto 17:     326528 pkt/s
> > proto 17:     326465 pkt/s
> > proto 17:     326550 pkt/s
> >
> > Result 2: sync_dma_len method
> > root@...8mpevk:~# ./xdp2 eth0
> > proto 17:     353918 pkt/s
> > proto 17:     352923 pkt/s
> > proto 17:     353900 pkt/s
> > proto 17:     352672 pkt/s
> > proto 17:     353912 pkt/s
> >
> 
> This looks more promising:
>   ((353912/326550)-1)*100 = 8.37% faster.
> 
> Or gaining/saving approx 236 nanosec per packet
> ((1/326550-1/353912)*10^9).
> 
> > Note: the speed of the generator is about 935397pps.
> >
> > Compared result 1 with result 2. The "sync_dma_len" method actually
> > improves the performance of XDP_TX, so the conclusion from the previous
> tests is *incorrect*.
> > I'm so sorry for that. :(
> >
> 
> I'm happy that we finally found the root-cause.
> Thanks for doing all the requested tests I asked for.
> 
> > In addition, I also tried the "dma_sync_len" + not use
> > xdp_convert_buff_to_frame() method, the performance has been further
> improved. Below is the result.
> >
> > Result 3: sync_dma_len + not use xdp_convert_buff_to_frame() method
> > root@...8mpevk:~# ./xdp2 eth0
> > proto 17:     369261 pkt/s
> > proto 17:     369267 pkt/s
> > proto 17:     369206 pkt/s
> > proto 17:     369214 pkt/s
> > proto 17:     369126 pkt/s
> >
> > Therefore, I'm intend to use the "dma_sync_len"+ not use
> > xdp_convert_buff_to_frame() method in the V5 patch. Thank you again,
> > Jesper and Jakub. You really helped me a lot. :)
> >
> 
> I suggest, that V5 patch still use xdp_convert_buff_to_frame(), and then you
> send followup patch (or as 2/2 patch) that remove the use of
> xdp_convert_buff_to_frame() for XDP_TX.  This way it is easier to keep track
> of the changes and improvements.
> 
Okay, I will do it.

> I would be very interested in knowing if the MMIO test change after this
> correction to the testlab/generator.
> 
The performance is significantly improved as you expected, but as I explained
before, I'm not sure whether there are the potential risks other than increase
latency. So I'm not going to modify it at the moment.

Below is the result that I changed the logic to do a MMIO-write on rx-BDR and
tx-BDR respectively in the end of the NPI callback.

root@...8mpevk:~# ./xdp2 eth0
proto 17:     436020 pkt/s
proto 17:     436167 pkt/s
proto 17:     434205 pkt/s
proto 17:     436140 pkt/s
proto 17:     436115 pkt/s

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ