lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6khewwumc5l47xjnamcqabrltnl7yr2qvb22zrqmmoq2vwn7yv@epxcgnflmkbt>
Date: Fri, 24 Oct 2025 16:20:32 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Neal Cardwell <ncardwell@...gle.com>, Willem de Bruijn <willemb@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, Matthieu Baerts <matttbe@...nel.org>, 
	Mat Martineau <martineau@...nel.org>, Geliang Tang <geliang@...nel.org>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com
Subject: Re: [PATCH net-next 3/3] tcp: fix too slow tcp_rcvbuf_grow() action

On Fri, Oct 24, 2025 at 08:42:21AM -0700, Eric Dumazet wrote:
> On Fri, Oct 24, 2025 at 8:25 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Fri, Oct 24, 2025 at 8:13 AM Dragos Tatulea <dtatulea@...dia.com> wrote:
> > >
> > > On 24.10.25 09:50, Eric Dumazet wrote:
> > > > While the blamed commits apparently avoided an overshoot,
> > > > they also limited how fast a sender can increase BDP at each RTT.
> > > >
> > > > This is not exactly a revert, we do not add the 16 * tp->advmss
> > > > cushion we had, and we are keeping the out_of_order_queue
> > > > contribution.
> > > >
> > > > Do the same in mptcp_rcvbuf_grow().
> > > >
> > > > Tested:
> > > >
> > > > emulated 50ms rtt (tcp_stream --tcp-tx-delay 50000), cubic 20 second flow.
> > > > net.ipv4.tcp_rmem set to "4096 131072 67000000"
> > > >
> > > > perf record -a -e tcp:tcp_rcvbuf_grow sleep 20
> > > > perf script
> > > >
> > > > Before:
> > > >
> > > > We can see we fail to roughly double RWIN at each RTT.
> > > > Sender is RWIN limited while CWND is ramping up (before getting tcp_wmem limited)
> > > >
> > > > tcp_stream 33793 [010]  825.717525: tcp:tcp_rcvbuf_grow: time=100869 rtt_us=50428 copied=49152 inq=0 space=40960 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=103970 window_clamp=112128 rcv_wnd=106496
> > > > tcp_stream 33793 [010]  825.768966: tcp:tcp_rcvbuf_grow: time=51447 rtt_us=50362 copied=86016 inq=0 space=49152 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=107474 window_clamp=112128 rcv_wnd=106496
> > > > tcp_stream 33793 [010]  825.821539: tcp:tcp_rcvbuf_grow: time=52577 rtt_us=50243 copied=114688 inq=0 space=86016 ooo=0 scaling_ratio=219 rcvbuf=201096 rcv_ssthresh=167377 window_clamp=172031 rcv_wnd=167936
> > > > tcp_stream 33793 [010]  825.871781: tcp:tcp_rcvbuf_grow: time=50248 rtt_us=50237 copied=167936 inq=0 space=114688 ooo=0 scaling_ratio=219 rcvbuf=268129 rcv_ssthresh=224722 window_clamp=229375 rcv_wnd=225280
> > > > tcp_stream 33793 [010]  825.922475: tcp:tcp_rcvbuf_grow: time=50698 rtt_us=50183 copied=241664 inq=0 space=167936 ooo=0 scaling_ratio=219 rcvbuf=392617 rcv_ssthresh=331217 window_clamp=335871 rcv_wnd=323584
> > > > tcp_stream 33793 [010]  825.973326: tcp:tcp_rcvbuf_grow: time=50855 rtt_us=50213 copied=339968 inq=0 space=241664 ooo=0 scaling_ratio=219 rcvbuf=564986 rcv_ssthresh=478674 window_clamp=483327 rcv_wnd=462848
> > > > tcp_stream 33793 [010]  826.023970: tcp:tcp_rcvbuf_grow: time=50647 rtt_us=50248 copied=491520 inq=0 space=339968 ooo=0 scaling_ratio=219 rcvbuf=794811 rcv_ssthresh=671778 window_clamp=679935 rcv_wnd=651264
> > > > tcp_stream 33793 [010]  826.074612: tcp:tcp_rcvbuf_grow: time=50648 rtt_us=50227 copied=700416 inq=0 space=491520 ooo=0 scaling_ratio=219 rcvbuf=1149124 rcv_ssthresh=974881 window_clamp=983039 rcv_wnd=942080
> > > > tcp_stream 33793 [010]  826.125452: tcp:tcp_rcvbuf_grow: time=50845 rtt_us=50225 copied=987136 inq=8192 space=700416 ooo=0 scaling_ratio=219 rcvbuf=1637502 rcv_ssthresh=1392674 window_clamp=1400831 rcv_wnd=1339392
> > > > tcp_stream 33793 [010]  826.175698: tcp:tcp_rcvbuf_grow: time=50250 rtt_us=50198 copied=1347584 inq=0 space=978944 ooo=0 scaling_ratio=219 rcvbuf=2288672 rcv_ssthresh=1949729 window_clamp=1957887 rcv_wnd=1945600
> > > > tcp_stream 33793 [010]  826.225947: tcp:tcp_rcvbuf_grow: time=50252 rtt_us=50240 copied=1945600 inq=0 space=1347584 ooo=0 scaling_ratio=219 rcvbuf=3150516 rcv_ssthresh=2687010 window_clamp=2695167 rcv_wnd=2691072
> > > > tcp_stream 33793 [010]  826.276175: tcp:tcp_rcvbuf_grow: time=50233 rtt_us=50224 copied=2691072 inq=0 space=1945600 ooo=0 scaling_ratio=219 rcvbuf=4548617 rcv_ssthresh=3883041 window_clamp=3891199 rcv_wnd=3887104
> > > > tcp_stream 33793 [010]  826.326403: tcp:tcp_rcvbuf_grow: time=50233 rtt_us=50229 copied=3887104 inq=0 space=2691072 ooo=0 scaling_ratio=219 rcvbuf=6291456 rcv_ssthresh=5370482 window_clamp=5382144 rcv_wnd=5373952
> > > > tcp_stream 33793 [010]  826.376723: tcp:tcp_rcvbuf_grow: time=50323 rtt_us=50218 copied=5373952 inq=0 space=3887104 ooo=0 scaling_ratio=219 rcvbuf=9087658 rcv_ssthresh=7755537 window_clamp=7774207 rcv_wnd=7757824
> > > > tcp_stream 33793 [010]  826.426991: tcp:tcp_rcvbuf_grow: time=50274 rtt_us=50196 copied=7757824 inq=180224 space=5373952 ooo=0 scaling_ratio=219 rcvbuf=12563759 rcv_ssthresh=10729233 window_clamp=10747903 rcv_wnd=10575872
> > > > tcp_stream 33793 [010]  826.477229: tcp:tcp_rcvbuf_grow: time=50241 rtt_us=50078 copied=10731520 inq=180224 space=7577600 ooo=0 scaling_ratio=219 rcvbuf=17715667 rcv_ssthresh=15136529 window_clamp=15155199 rcv_wnd=14983168
> > > > tcp_stream 33793 [010]  826.527482: tcp:tcp_rcvbuf_grow: time=50258 rtt_us=50153 copied=15138816 inq=360448 space=10551296 ooo=0 scaling_ratio=219 rcvbuf=24667870 rcv_ssthresh=21073410 window_clamp=21102591 rcv_wnd=20766720
> > > > tcp_stream 33793 [010]  826.577712: tcp:tcp_rcvbuf_grow: time=50234 rtt_us=50228 copied=21073920 inq=0 space=14778368 ooo=0 scaling_ratio=219 rcvbuf=34550339 rcv_ssthresh=29517041 window_clamp=29556735 rcv_wnd=29519872
> > > > tcp_stream 33793 [010]  826.627982: tcp:tcp_rcvbuf_grow: time=50275 rtt_us=50220 copied=29519872 inq=540672 space=21073920 ooo=0 scaling_ratio=219 rcvbuf=49268707 rcv_ssthresh=42090625 window_clamp=42147839 rcv_wnd=41627648
> > > > tcp_stream 33793 [010]  826.678274: tcp:tcp_rcvbuf_grow: time=50296 rtt_us=50185 copied=42053632 inq=761856 space=28979200 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57238168 window_clamp=57316406 rcv_wnd=56606720
> > > > tcp_stream 33793 [010]  826.728627: tcp:tcp_rcvbuf_grow: time=50357 rtt_us=50128 copied=43913216 inq=851968 space=41291776 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56524800
> > > > tcp_stream 33793 [010]  827.131364: tcp:tcp_rcvbuf_grow: time=50239 rtt_us=50127 copied=43843584 inq=655360 space=43061248 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56696832
> > > > tcp_stream 33793 [010]  827.181613: tcp:tcp_rcvbuf_grow: time=50254 rtt_us=50115 copied=43843584 inq=524288 space=43188224 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56807424
> > > > tcp_stream 33793 [010]  828.339635: tcp:tcp_rcvbuf_grow: time=50283 rtt_us=50110 copied=43843584 inq=458752 space=43319296 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56864768
> > > > tcp_stream 33793 [010]  828.440350: tcp:tcp_rcvbuf_grow: time=50404 rtt_us=50099 copied=43843584 inq=393216 space=43384832 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=56922112
> > > > tcp_stream 33793 [010]  829.195106: tcp:tcp_rcvbuf_grow: time=50154 rtt_us=50077 copied=43843584 inq=196608 space=43450368 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57290728 window_clamp=57316406 rcv_wnd=57090048
> > > >
> > > > After:
> > > >
> > > > It takes few steps to increase RWIN. Sender is no longer RWIN limited.
> > > >
> > > > tcp_stream 50826 [010]  935.634212: tcp:tcp_rcvbuf_grow: time=100788 rtt_us=50315 copied=49152 inq=0 space=40960 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=103970 window_clamp=112128 rcv_wnd=106496
> > > > tcp_stream 50826 [010]  935.685642: tcp:tcp_rcvbuf_grow: time=51437 rtt_us=50361 copied=86016 inq=0 space=49152 ooo=0 scaling_ratio=219 rcvbuf=160875 rcv_ssthresh=132969 window_clamp=137623 rcv_wnd=131072
> > > > tcp_stream 50826 [010]  935.738299: tcp:tcp_rcvbuf_grow: time=52660 rtt_us=50256 copied=139264 inq=0 space=86016 ooo=0 scaling_ratio=219 rcvbuf=502741 rcv_ssthresh=411497 window_clamp=430079 rcv_wnd=413696
> > > > tcp_stream 50826 [010]  935.788544: tcp:tcp_rcvbuf_grow: time=50249 rtt_us=50233 copied=307200 inq=0 space=139264 ooo=0 scaling_ratio=219 rcvbuf=728690 rcv_ssthresh=618717 window_clamp=623371 rcv_wnd=618496
> > > > tcp_stream 50826 [010]  935.838796: tcp:tcp_rcvbuf_grow: time=50258 rtt_us=50202 copied=618496 inq=0 space=307200 ooo=0 scaling_ratio=219 rcvbuf=2450338 rcv_ssthresh=1855709 window_clamp=2096187 rcv_wnd=1859584
> > > > tcp_stream 50826 [010]  935.889140: tcp:tcp_rcvbuf_grow: time=50347 rtt_us=50166 copied=1261568 inq=0 space=618496 ooo=0 scaling_ratio=219 rcvbuf=4376503 rcv_ssthresh=3725291 window_clamp=3743961 rcv_wnd=3706880
> > > > tcp_stream 50826 [010]  935.939435: tcp:tcp_rcvbuf_grow: time=50300 rtt_us=50185 copied=2478080 inq=24576 space=1261568 ooo=0 scaling_ratio=219 rcvbuf=9082648 rcv_ssthresh=7733731 window_clamp=7769921 rcv_wnd=7692288
> > > > tcp_stream 50826 [010]  935.989681: tcp:tcp_rcvbuf_grow: time=50251 rtt_us=50221 copied=4915200 inq=114688 space=2453504 ooo=0 scaling_ratio=219 rcvbuf=16574936 rcv_ssthresh=14108110 window_clamp=14179339 rcv_wnd=14024704
> > > > tcp_stream 50826 [010]  936.039967: tcp:tcp_rcvbuf_grow: time=50289 rtt_us=50279 copied=9830400 inq=114688 space=4800512 ooo=0 scaling_ratio=219 rcvbuf=32695050 rcv_ssthresh=27896187 window_clamp=27969593 rcv_wnd=27815936
> > > > tcp_stream 50826 [010]  936.090172: tcp:tcp_rcvbuf_grow: time=50211 rtt_us=50200 copied=19841024 inq=114688 space=9715712 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57245176 window_clamp=57316406 rcv_wnd=57163776
> > > > tcp_stream 50826 [010]  936.140430: tcp:tcp_rcvbuf_grow: time=50262 rtt_us=50197 copied=39501824 inq=114688 space=19726336 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57245176 window_clamp=57316406 rcv_wnd=57163776
> > > > tcp_stream 50826 [010]  936.190527: tcp:tcp_rcvbuf_grow: time=50101 rtt_us=50071 copied=43655168 inq=262144 space=39387136 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57032704
> > > > tcp_stream 50826 [010]  936.240719: tcp:tcp_rcvbuf_grow: time=50197 rtt_us=50057 copied=43843584 inq=262144 space=43393024 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57032704
> > > > tcp_stream 50826 [010]  936.341271: tcp:tcp_rcvbuf_grow: time=50297 rtt_us=50123 copied=43843584 inq=131072 space=43581440 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57147392
> > > > tcp_stream 50826 [010]  936.642503: tcp:tcp_rcvbuf_grow: time=50131 rtt_us=50084 copied=43843584 inq=0 space=43712512 ooo=0 scaling_ratio=219 rcvbuf=67000000 rcv_ssthresh=57259192 window_clamp=57316406 rcv_wnd=57262080
> > > >
> > > > Fixes: 65c5287892e9 ("tcp: fix sk_rcvbuf overshoot")
> > > > Fixes: e118cdc34dd1 ("mptcp: rcvbuf auto-tuning improvement")
> > > > Reported-by: Neal Cardwell <ncardwell@...gle.com>
> > > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > > ---
> > > >  net/ipv4/tcp_input.c | 8 +++++++-
> > > >  net/mptcp/protocol.c | 7 +++++++
> > > >  2 files changed, 14 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > index c8cfd700990f28a8bc64e4353a2c78a82bb6bcb2..f004072654a4c50da14b9dafc46133feb71f12cd 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -896,6 +896,7 @@ void tcp_rcvbuf_grow(struct sock *sk, u32 newval)
> > > >       const struct net *net = sock_net(sk);
> > > >       struct tcp_sock *tp = tcp_sk(sk);
> > > >       u32 rcvwin, rcvbuf, cap, oldval;
> > > > +     u64 grow;
> > > >
> > > >       oldval = tp->rcvq_space.space;
> > > >       tp->rcvq_space.space = newval;
> > > > @@ -904,9 +905,14 @@ void tcp_rcvbuf_grow(struct sock *sk, u32 newval)
> > > >           (sk->sk_userlocks & SOCK_RCVBUF_LOCK))
> > > >               return;
> > > >
> > > > -     /* slow start: allow the sender to double its rate. */
> > > > +     /* DRS is always one RTT late. */
> > > >       rcvwin = newval << 1;
> > > >
> > > > +     /* slow start: allow the sender to double its rate. */
> > > > +     grow = (u64)rcvwin * (newval - oldval);
> > > > +     do_div(grow, oldval);
> > > > +     rcvwin += grow << 1;
> > > > +
> > > >       if (!RB_EMPTY_ROOT(&tp->out_of_order_queue))
> > > >               rcvwin += TCP_SKB_CB(tp->ooo_last_skb)->end_seq - tp->rcv_nxt;
> > > >
> > > Hi Eric,
> > >
> > > When applying this series I see a regression in a simple 25G iperf test:
> > > retransmissions are seen due to packet drops (out of buffer) on the
> > > server side.
> > >
> > > The test:
> > > - server: iperf3 -s -A 5
> > > - client: iperf3 -c 1.1.1.1 -B 25G
> > > - Configuration:
> > >   - Server has a single queue with affinity set on CPU 5.
> > >   - Ring size: 1K (4K ring size seems ok)
> > >   - MTU: 1500
> > >   - Client uses TSO, server uses SW GRO.
> > >
> > > Before series (includes first patch):
> > > <...>-2192  [005]   162.451893: tcp_rcvbuf_grow: time=1622 rtt_us=1596 copied=76781 inq=30408 space=14480 ooo=0 scaling_ratio=188 rcvbuf=131072 rcv_ssthresh=91990 window_clamp=96256 rcv_wnd=66560
> > > <...>-2192  [005]   162.451998: tcp_rcvbuf_grow: time=106 rtt_us=105 copied=158720 inq=0 space=46373 ooo=0 scaling_ratio=188 rcvbuf=131072 rcv_ssthresh=91990 window_clamp=96256 rcv_wnd=92160
> > > <...>-2192  [005]   162.453254: tcp_rcvbuf_grow: time=142 rtt_us=44 copied=292496 inq=91512 space=158720 ooo=0 scaling_ratio=188 rcvbuf=432258 rcv_ssthresh=270533 window_clamp=317439 rcv_wnd=253952
> > > <...>-2192  [005]   162.454446: tcp_rcvbuf_grow: time=113 rtt_us=44 copied=343176 inq=127424 space=200984 ooo=0 scaling_ratio=188 rcvbuf=547360 rcv_ssthresh=349656 window_clamp=401967 rcv_wnd=345088
> > > <...>-2192  [005]   162.455726: tcp_rcvbuf_grow: time=52 rtt_us=44 copied=264464 inq=40544 space=215752 ooo=0 scaling_ratio=188 rcvbuf=587579 rcv_ssthresh=391036 window_clamp=431503 rcv_wnd=194560
> > > <...>-2192  [005]   162.456444: tcp_rcvbuf_grow: time=37 rtt_us=36 copied=322560 inq=0 space=223920 ooo=0 scaling_ratio=188 rcvbuf=609824 rcv_ssthresh=391036 window_clamp=447839 rcv_wnd=323584
> > > <...>-2192  [005]   162.456865: tcp_rcvbuf_grow: time=40 rtt_us=36 copied=421840 inq=73848 space=322560 ooo=0 scaling_ratio=188 rcvbuf=878461 rcv_ssthresh=581105 window_clamp=645119 rcv_wnd=515072
> > > <...>-2192  [005]   162.457762: tcp_rcvbuf_grow: time=38 rtt_us=36 copied=430176 inq=65160 space=347992 ooo=0 scaling_ratio=188 rcvbuf=947722 rcv_ssthresh=631969 window_clamp=695983 rcv_wnd=467968
> > > <...>-2192  [005]   162.463191: tcp_rcvbuf_grow: time=35 rtt_us=34 copied=411336 inq=0 space=365016 ooo=0 scaling_ratio=188 rcvbuf=994086 rcv_ssthresh=666017 window_clamp=730031 rcv_wnd=354304
> > > <...>-2192  [005]   162.469069: tcp_rcvbuf_grow: time=38 rtt_us=34 copied=444520 inq=0 space=411336 ooo=0 scaling_ratio=188 rcvbuf=1120234 rcv_ssthresh=783379 window_clamp=822671 rcv_wnd=679936
> > >
> > > After series:
> > > <...>-2585  [005]  1061.768676: tcp_rcvbuf_grow: time=623 rtt_us=600 copied=72437 inq=28960 space=14480 ooo=0 scaling_ratio=188 rcvbuf=131072 rcv_ssthresh=81968 window_clamp=96256 rcv_wnd=82944
> 
> Looking again, it seems the initial rtt_us is much bigger than the
> rtt_us of 55 usec you have later (or 34 usec on previous samples)
> 
Right. It is bigger in both cases. Didn't realize that it is not
expected.

> DRS is fooled by the apparent explosion (more than traditional Slow
> Start) of the incoming traffic.
> 
> 1) Do you have a large initial cwnd ? (standard is IW10)
Nope:
09.390218 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [S], seq 2278471684, win 64240, options [mss 1460,sackOK,TS val 4277109415 ecr 0,nop,wscale 10], length 0                                                                                                                                                                                                                        
09.390224 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [S.], seq 533508084, ack 2278471685, win 65160, options [mss 1460,sackOK,TS val 1442753584 ecr 4277109415,nop,wscale 10], length 0                                                                                                                                                                                               
09.390242 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [.], ack 1, win 63, options [nop,nop,TS val 4277109415 ecr 1442753584], length 0                                                                                                                                                                                                                                                 
09.390248 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 1:38, ack 1, win 63, options [nop,nop,TS val 4277109415 ecr 1442753584], length 37                                                                                                                                                                                                                                     
09.390250 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [.], ack 38, win 64, options [nop,nop,TS val 1442753584 ecr 4277109415], length 0                                                                                                                                                                                                                                                
09.390722 IP 1.1.1.1.5201 > 1.1.1.2.45742: Flags [P.], seq 3:4, ack 171, win 64, options [nop,nop,TS val 1442753584 ecr 4277109415], length 1                                                                                                                                                                                                                                     
09.390727 IP 1.1.1.1.5201 > 1.1.1.2.45742: Flags [P.], seq 4:5, ack 171, win 64, options [nop,nop,TS val 1442753584 ecr 4277109415], length 1                                                                                                                                                                                                                                     
09.390746 IP 1.1.1.2.45742 > 1.1.1.1.5201: Flags [.], ack 5, win 63, options [nop,nop,TS val 4277109415 ecr 1442753584], length 0                                                                                                                                                                                                                                                 
09.390758 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 38:7278, ack 1, win 63, options [nop,nop,TS val 4277109415 ecr 1442753584], length 7240                                                                                                                                                                                                                                
09.390760 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 7278:14518, ack 1, win 63, options [nop,nop,TS val 4277109415 ecr 1442753584], length 7240                                                                                                                                                                                                                             
09.390763 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [.], ack 7278, win 79, options [nop,nop,TS val 1442753584 ecr 4277109415], length 0                                                                                                                                                                                                                                              
09.390767 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [.], ack 14518, win 81, options [nop,nop,TS val 1442753584 ecr 4277109415], length 0                                                                                                                                                                                                                                             
09.390787 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 14518:24654, ack 1, win 63, options [nop,nop,TS val 4277109416 ecr 1442753584], length 10136                                                                                                                                                                                                                           
09.390789 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 24654:39134, ack 1, win 63, options [nop,nop,TS val 4277109416 ecr 1442753584], length 14480                                                                                                                                                                                                                           
09.390791 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [.], ack 24654, win 85, options [nop,nop,TS val 1442753584 ecr 4277109416], length 0                                                                                                                                                                                                                                             
09.390793 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [.], ack 39134, win 71, options [nop,nop,TS val 1442753584 ecr 4277109416], length 0                                                                                                                                                                                                                                             
09.390795 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 39134:43478, ack 1, win 63, options [nop,nop,TS val 4277109416 ecr 1442753584], length 4344                                                                                                                                                                                                                            
09.390796 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [.], ack 43478, win 67, options [nop,nop,TS val 1442753584 ecr 4277109416], length 0                                                                                                                                                                                                                                             
09.390803 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 43478:46374, ack 1, win 63, options [nop,nop,TS val 4277109416 ecr 1442753584], length 2896                                                                                                                                                                                                                            
09.390805 IP 1.1.1.1.5201 > 1.1.1.2.45754: Flags [.], ack 46374, win 65, options [nop,nop,TS val 1442753584 ecr 4277109416], length 0                                                                                                                                                                                                                                             
09.391880 IP 1.1.1.2.45754 > 1.1.1.1.5201: Flags [P.], seq 46374:76782, ack 1, win 63, options [nop,nop,TS val 4277109417 ecr 1442753584], length 30408
...

> 2) iperf3 does not seem to be able to read the queue fast enough. Is
> it reading 1000 bytes at a time ?
>
iperf3 is reading 128K bytes at a time.

> > > <...>-2585  [005]  1061.769859: tcp_rcvbuf_grow: time=89 rtt_us=55 copied=250560 inq=46336 space=43477 ooo=0 scaling_ratio=188 rcvbuf=592631 rcv_ssthresh=302062 window_clamp=435213 rcv_wnd=230400
> > > <...>-2585  [005]  1061.775618: tcp_rcvbuf_grow: time=56 rtt_us=55 copied=405296 inq=140016 space=204224 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=1927847 window_clamp=3428745 rcv_wnd=1928192
> > > <...>-2585  [005]  1061.777324: tcp_rcvbuf_grow: time=57 rtt_us=55 copied=450664 inq=131072 space=265280 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3106743 window_clamp=3428745 rcv_wnd=3006464
> > > <...>-2585  [005]  1061.783411: tcp_rcvbuf_grow: time=58 rtt_us=55 copied=521280 inq=41160 space=319592 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3364731 window_clamp=3428745 rcv_wnd=2086912
> > > <...>-2585  [005]  1061.790393: tcp_rcvbuf_grow: time=55 rtt_us=55 copied=524288 inq=0 space=480120 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3364731 window_clamp=3428745 rcv_wnd=2492416
> > > <...>-2585  [005]  1061.935387: tcp_rcvbuf_grow: time=55 rtt_us=55 copied=537824 inq=0 space=524288 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3364731 window_clamp=3428745 rcv_wnd=2258944
> > > <...>-2585  [005]  1062.977374: tcp_rcvbuf_grow: time=57 rtt_us=55 copied=545064 inq=0 space=537824 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3428745 window_clamp=3428745 rcv_wnd=2223104
> > > <...>-2585  [005]  1064.873376: tcp_rcvbuf_grow: time=57 rtt_us=55 copied=549408 inq=0 space=545064 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3428745 window_clamp=3428745 rcv_wnd=2509824
> > > <...>-2585  [005]  1065.984340: tcp_rcvbuf_grow: time=59 rtt_us=55 copied=574024 inq=0 space=549408 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3428745 window_clamp=3428745 rcv_wnd=2336768
> > > <...>-2585  [005]  1066.210718: tcp_rcvbuf_grow: time=410 rtt_us=55 copied=589448 inq=0 space=574024 ooo=0 scaling_ratio=188 rcvbuf=4668930 rcv_ssthresh=3428745 window_clamp=3428745 rcv_wnd=3364864
> > >
> > > Is this expected?
> >
> > Typical cubic reaction I would say, with pfifo_fast ?
> >
Yes.

> > I do not think that artificially slowing down senders by not growing
> > rwin fast enough is the right answer.
> >
> > If you have drops, that is because the sender is sending too fast or
> > with too large burts.
> >
> > Try using the fq packet scheduler and see if it helps.
Switched to fq on the sender but it didn't help. I suppose that in this
case having pfifo_fast on the receiver doesn't change much.

Thanks,
Dragos

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ