lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 30 Mar 2020 10:16:26 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Yi Yang (杨燚)-云服务集团 
        <yangyi01@...pur.com>
Cc:     "yang_y_yi@....com" <yang_y_yi@....com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "u9012063@...il.com" <u9012063@...il.com>
Subject: Re: [vger.kernel.org代发]Re: [vger.kernel.org代发]Re: [PATCH net-next] net/ packet: fix TPACKET_V3 performance issue in case of TSO

On Mon, Mar 30, 2020 at 2:35 AM Yi Yang (杨燚)-云服务集团 <yangyi01@...pur.com> wrote:
>
> -----邮件原件-----
> 发件人: Willem de Bruijn [mailto:willemdebruijn.kernel@...il.com]
> 发送时间: 2020年3月30日 9:52
> 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@...pur.com>
> 抄送: willemdebruijn.kernel@...il.com; yang_y_yi@....com; netdev@...r.kernel.org; u9012063@...il.com
> 主题: Re: [vger.kernel.org代发]Re: [vger.kernel.org代发]Re: [PATCH net-next] net/ packet: fix TPACKET_V3 performance issue in case of TSO
>
> > iperf3 test result
> > -----------------------
> > [yangyi@...alhost ovs-master]$ sudo ../run-iperf3.sh
> > iperf3: no process found
> > Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port
> > 44976 connected to 10.15.1.3 port 5201
> > [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> > [  4]   0.00-10.00  sec  19.6 GBytes  16.8 Gbits/sec  106586    307 KBytes
> > [  4]  10.00-20.00  sec  19.5 GBytes  16.7 Gbits/sec  104625    215 KBytes
> > [  4]  20.00-30.00  sec  20.0 GBytes  17.2 Gbits/sec  106962    301 KBytes
>
> Thanks for the detailed info.
>
> So there is more going on there than a simple network tap. veth, which calls netif_rx and thus schedules delivery with a napi after a softirq (twice), tpacket for recv + send + ovs processing. And this is a single flow, so more sensitive to batching, drops and interrupt moderation than a workload of many flows.
>
> If anything, I would expect the ACKs on the return path to be the more likely cause for concern, as they are even less likely to fill a block before the timer. The return path is a separate packet socket?
>
> With initial small window size, I guess it might be possible for the entire window to be in transit. And as no follow-up data will arrive, this waits for the timeout. But at 3Gbps that is no longer the case.
> Again, the timeout is intrinsic to TPACKET_V3. If that is unacceptable, then TPACKET_V2 is a more logical choice. Here also in relation to timely ACK responses.
>
> Other users of TPACKET_V3 may be using fewer blocks of larger size. A change to retire blocks after 1 gso packet will negatively affect their workloads. At the very least this should be an optional feature, similar to how I suggested converting to micro seconds.
>
> [Yi Yang] My iperf3 test is TCP socket, return path is same socket as forward path. BTW this patch will retire current block only if vnet header is in packets, I don't know what else use cases will use vnet header except our user scenario. In addition, I also have more conditions to limit this, but it impacts on performance. I'll try if V2 can fix our issue, this will be only one way to fix our issue if not.
>

Thanks. Also interesting might be a short packet trace of packet
arrival on the bond device ports, taken at the steady state of 3 Gbps.
To observe when inter-arrival time exceeds the 167 usec mean. Also
informative would be to learn whether when retiring a block using your
patch, that block also holds one or more ACK packets along with the
GSO packet. As their delay might be the true source of throttling the sender.

I think we need to understand the underlying problem better to
implement a robust fix that works for a variety of configurations, and
does not causing accidental regressions. The current patch works for
your setup, but I'm afraid that it might paper over the real issue.

It is a peculiar aspect of TPACKET_V3 that blocks are retired not when
a packet is written that fills them, but when the next packet arrives
and cannot find room. Again, at sustained rate that delay should be
immaterial. But it might be okay to measure remaining space after
write and decide to retire if below some watermark. I would prefer
that watermark to be a ratio of block size rather than whether the
packet is gso or not.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ