[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1518821435.55655.6.camel@gmail.com>
Date: Fri, 16 Feb 2018 14:50:35 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Eric Dumazet <edumazet@...gle.com>,
Oleksandr Natalenko <oleksandr@...alenko.name>
Cc: Neal Cardwell <ncardwell@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
Yuchung Cheng <ycheng@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
Jerry Chu <hkchu@...gle.com>, Dave Taht <dave.taht@...il.com>
Subject: Re: TCP and BBR: reproducibly low cwnd and bandwidth
On Fri, 2018-02-16 at 12:54 -0800, Eric Dumazet wrote:
> On Fri, Feb 16, 2018 at 9:25 AM, Oleksandr Natalenko
> <oleksandr@...alenko.name> wrote:
> > Hi.
> >
> > On pátek 16. února 2018 17:33:48 CET Neal Cardwell wrote:
> > > Thanks for the detailed report! Yes, this sounds like an issue in BBR. We
> > > have not run into this one in our team, but we will try to work with you to
> > > fix this.
> > >
> > > Would you be able to take a sender-side tcpdump trace of the slow BBR
> > > transfer ("v4.13 + BBR + fq_codel == Not OK")? Packet headers only would be
> > > fine. Maybe something like:
> > >
> > > tcpdump -w /tmp/test.pcap -c1000000 -s 100 -i eth0 port $PORT
> >
> > So, going on with two real HW hosts. They are both running latest stock Arch
> > Linux kernel (4.15.3-1-ARCH, CONFIG_PREEMPT=y, CONFIG_HZ=1000) and are
> > interconnected with 1 Gbps link (via switch if that matters). Using iperf3,
> > running each test for 20 seconds.
> >
> > Having BBR+fq_codel (or pfifo_fast, same result) on both hosts:
> >
> > Client to server: 112 Mbits/sec
> > Server to client: 96.1 Mbits/sec
> >
> > Having BBR+fq on both hosts:
> >
> > Client to server: 347 Mbits/sec
> > Server to client: 397 Mbits/sec
> >
> > Having YeAH+fq on both hosts:
> > [1] https://natalenko.name/myfiles/bbr/
> >
>
> Something fishy really :
>
> 09:18:31.449903 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [P.],
> seq 76745:79641, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 2896
> 09:18:31.449916 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 79641, win 1011, options [nop,nop,TS val 3190508870 ecr
> 2327043753], length 0
> 09:18:31.449925 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 79641:83985, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 4344
> 09:18:31.449936 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 83985, win 987, options [nop,nop,TS val 3190508870 ecr
> 2327043753], length 0
> 09:18:31.450112 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 83985:86881, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 2896
> 09:18:31.450124 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 86881, win 971, options [nop,nop,TS val 3190508871 ecr
> 2327043753], length 0
> 09:18:31.450299 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 86881:91225, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 4344
> 09:18:31.450313 IP 172.29.28.55.14936 > 172.29.28.1.5201: Flags [.],
> ack 91225, win 947, options [nop,nop,TS val 3190508871 ecr
> 2327043753], length 0
> 09:18:31.450491 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [P.],
> seq 91225:92673, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508870], length 1448
> 09:18:31.450505 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 92673:94121, ack 38, win 227, options [nop,nop,TS val 2327043753
> ecr 3190508871], length 1448
> 09:18:31.450511 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [P.],
> seq 94121:95569, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 1448
> 09:18:31.450720 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 95569:101361, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 5792
> 09:18:31.450932 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 101361:105705, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 4344
> 09:18:31.451132 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 105705:110049, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 4344
> 09:18:31.451342 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 110049:111497, ack 38, win 227, options [nop,nop,TS val 2327043754
> ecr 3190508871], length 1448
> 09:18:31.455841 IP 172.29.28.1.5201 > 172.29.28.55.14936: Flags [.],
> seq 111497:112945, ack 38, win 227, options [nop,nop,TS val 2327043759
> ecr 3190508871], length 1448
>
> Not only the receiver suddenly adds a 25 ms delay, but also note that
> it acknowledges all prior segments (ack 112949), but with a wrong ecr
> value ( 2327043753 )
> instead of 2327043759
If you use
tcptrace -R test_s2c.pcap
xplot.org d2c_rtt.xpl
Then you'll see plenty of suspect 40ms rtt samples.
It looks like receiver misses wakeups for some reason,
and only the TCP delayed ACK timer is helping.
So it does not look like a sender side issue to me.
Powered by blists - more mailing lists