lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fd8840fc769943cf9a9089c424efbd24@inspur.com>
Date:   Wed, 15 Apr 2020 03:33:49 +0000
From:   Yi Yang (杨燚)-云服务集团 
        <yangyi01@...pur.com>
To:     "willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>
CC:     "yang_y_yi@....com" <yang_y_yi@....com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "u9012063@...il.com" <u9012063@...il.com>
Subject: 答复: [vger.kernel.org代发]Re: [vger.kernel.org代发]Re: [vger.kernel.org代        发]Re: [vger.kernel.org代 发]Re: [PATCH net-next] net/ pa        cket: fix TPACKET_V3 perform ance issue in case of TSO

-----邮件原件-----
发件人: netdev-owner@...r.kernel.org [mailto:netdev-owner@...r.kernel.org] 代表 Willem de Bruijn
发送时间: 2020年4月14日 22:04
收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@...pur.com>
抄送: yang_y_yi@....com; netdev@...r.kernel.org; u9012063@...il.com
主题: [vger.kernel.org代发]Re: [vger.kernel.org代发]Re: [vger.kernel.org代 发]Re: [vger.kernel.org代 发]Re: [PATCH net-next] net/ pa cket: fix TPACKET_V3 perform ance issue in case of TSO

> > > iperf3 test result
> > > -----------------------
> > > [yangyi@...alhost ovs-master]$ sudo ../run-iperf3.sh
> > > iperf3: no process found
> > > Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port
> > > 44976 connected to 10.15.1.3 port 5201
> > > [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> > > [  4]   0.00-10.00  sec  19.6 GBytes  16.8 Gbits/sec  106586    307 KBytes
> > > [  4]  10.00-20.00  sec  19.5 GBytes  16.7 Gbits/sec  104625    215 KBytes
> > > [  4]  20.00-30.00  sec  20.0 GBytes  17.2 Gbits/sec  106962    301 KBytes
> >
> > Thanks for the detailed info.
> >
> > So there is more going on there than a simple network tap. veth, which calls netif_rx and thus schedules delivery with a napi after a softirq (twice), tpacket for recv + send + ovs processing. And this is a single flow, so more sensitive to batching, drops and interrupt moderation than a workload of many flows.
> >
> > If anything, I would expect the ACKs on the return path to be the more likely cause for concern, as they are even less likely to fill a block before the timer. The return path is a separate packet socket?
> >
> > With initial small window size, I guess it might be possible for the entire window to be in transit. And as no follow-up data will arrive, this waits for the timeout. But at 3Gbps that is no longer the case.
> > Again, the timeout is intrinsic to TPACKET_V3. If that is unacceptable, then TPACKET_V2 is a more logical choice. Here also in relation to timely ACK responses.
> >
> > Other users of TPACKET_V3 may be using fewer blocks of larger size. A change to retire blocks after 1 gso packet will negatively affect their workloads. At the very least this should be an optional feature, similar to how I suggested converting to micro seconds.
> >
> > [Yi Yang] My iperf3 test is TCP socket, return path is same socket as forward path. BTW this patch will retire current block only if vnet header is in packets, I don't know what else use cases will use vnet header except our user scenario. In addition, I also have more conditions to limit this, but it impacts on performance. I'll try if V2 can fix our issue, this will be only one way to fix our issue if not.
> >
>
> Thanks. Also interesting might be a short packet trace of packet arrival on the bond device ports, taken at the steady state of 3 Gbps.
> To observe when inter-arrival time exceeds the 167 usec mean. Also informative would be to learn whether when retiring a block using your patch, that block also holds one or more ACK packets along with the GSO packet. As their delay might be the true source of throttling the sender.
>
> I think we need to understand the underlying problem better to implement a robust fix that works for a variety of configurations, and does not causing accidental regressions. The current patch works for your setup, but I'm afraid that it might paper over the real issue.
>
> It is a peculiar aspect of TPACKET_V3 that blocks are retired not when a packet is written that fills them, but when the next packet arrives and cannot find room. Again, at sustained rate that delay should be immaterial. But it might be okay to measure remaining space after write and decide to retire if below some watermark. I would prefer that watermark to be a ratio of block size rather than whether the packet is gso or not.
>
> [Yi Yang] Sorry for late reply, I missed this email. I did do timing 
> for every received frames, time interval is highly dynamic, I can't 
> find any valuable clues, but I did find TCP ACK frames have big impact 
> on performance, which are some small frames (size is not more than 
> 100), in TPACKET_V3 case, a block will have a bunch of such TCP ACK 
> frames, so these ACK frames aren't received and sent back to the 
> receiver in time. I tried TPACKET_V2, its performance is beyond I 
> expect, I tried it in kernel 5.5.9, its performance is better than 
> this patch, about 11Gbps, I also tried kernel 4.15.0 (from Ubuntu, it 
> actually cherry picked many fixed patches from upstream, so isn't 
> official 4.15.0), its performance is about 14Gbps, worse than this 
> patch (it is 17Gbps), so obviously the performance is kernel-related, 
> platform related. In non-pmd case (i.e. sender and receiver are one 
> thread and use the same CPU), TPACKET_V2 is much better then 
> recvmmsg&sendmmsg. We decide to use TPACKET_V2 for TSO. But we don't 
> know how we can reach higher performance than 14Gbps, it looks like 
> tpacket_v2/v3's cache flush operation has side effect on performance 
> (especially once flush per frame for TPACKET_V2)

Kernel 5.5.9 with TPACKET_V2 is better than this patch at 11 Gbps, but Ubuntu 4.15.0 is worse that this patch at 14 Gbps (this patch is 17)?

[Yi Yang] It's true that performance of kernel 5.5.9 is worse than Ubuntu kernel 4.15.0, when I tested this patch, 5.5.9 only can reach 11Gbps, but Ubuntu kernel 4.15.0 can reach 17Gbps, I don't know why, performance of recvmmsg & sendmmsg also shows the same situation, i.e. Ubuntu kernel 4.15.0 is better than kernel 5.5.9, the same situation is for TPACKET_V2 for TSO, maybe it is a performance regression on kernel side. Default HZ is 1000 for kernel 5.5.9, but it is 250 for Ubuntu 4.15.0, I'm not sure if it is one of factors.

How did you arrive at the conclusion that the cache flush operation is the main bottleneck?

[Yi Yang] Cache flush is high overhead like spinlock, especially if every frame triggers cache flush, I know it is unavoidable for TPACKET, otherwise userspace can't timely see changes kernel did.

Good to hear that you verified that a main issue is the ACK delay.

Instead of packet sockets, you could also take a look at AF_XDP. There seems to be documentation on how to deploy it with OVS.

[Yi Yang] Yes, current OVS can support AF_XDP, but it needs recent kernels, for our use cases, its performance isn't better than tpacket, mostly important, tpacket is available since 3.10.0, so all the current Linux distributions can run it, this is major reason why we prefer tpacket.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (3600 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ