[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+xW+mwY=Y5_r7RPEavq63PjMAwek91+19VQLFnRYrR8g@mail.gmail.com>
Date: Sun, 19 Oct 2025 10:58:20 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Peng Yu <yupeng0921@...il.com>
Cc: ncardwell@...gle.com, kuniyu@...gle.com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Peng Yu <peng.yu@...baba-inc.com>
Subject: Re: [PATCH] net: set is_cwnd_limited when the small queue check fails
On Sun, Oct 19, 2025 at 10:43 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Sun, Oct 19, 2025 at 10:00 AM Peng Yu <yupeng0921@...il.com> wrote:
> >
> > The limit of the small queue check is calculated from the pacing rate,
> > the pacing rate is calculated from the cwnd. If the cwnd is small,
> > the small queue check may fail.
> > When the samll queue check fails, the tcp layer will send less
> > packages, then the tcp_is_cwnd_limited would alreays return false,
> > then the cwnd would have no chance to get updated.
> > The cwnd has no chance to get updated, it keeps small, then the pacing
> > rate keeps small, and the limit of the small queue check keeps small,
> > then the small queue check would always fail.
> > It is a kind of dead lock, when a tcp flow comes into this situation,
> > it's throughput would be very small, obviously less then the correct
> > throughput it should have.
> > We set is_cwnd_limited to true when the small queue check fails, then
> > the cwnd would have a chance to get updated, then we can break this
> > deadlock.
> >
> > Below ss output shows this issue:
> >
> > skmem:(r0,rb131072,
> > t7712, <------------------------------ wmem_alloc = 7712
> > tb243712,f2128,w219056,o0,bl0,d0)
> > ts sack cubic wscale:7,10 rto:224 rtt:23.364/0.019 ato:40 mss:1448
> > pmtu:8500 rcvmss:536 advmss:8448
> > cwnd:28 <------------------------------ cwnd=28
> > bytes_sent:2166208 bytes_acked:2148832 bytes_received:37
> > segs_out:1497 segs_in:751 data_segs_out:1496 data_segs_in:1
> > send 13882554bps lastsnd:7 lastrcv:2992 lastack:7
> > pacing_rate 27764216bps <--------------------- pacing_rate=27764216bps
> > delivery_rate 5786688bps delivered:1485 busy:2991ms unacked:12
> > rcv_space:57088 rcv_ssthresh:57088 notsent:188240
> > minrtt:23.319 snd_wnd:57088
> >
> > limit=(27764216 / 8) / 1024 = 3389 < 7712
> > So the samll queue check fails. When it happens, the throughput is
> > obviously less than the normal situation.
> >
> > By setting the tcp_is_cwnd_limited to true when the small queue check
> > failed, we can avoid this issue, the cwnd could increase to a reasonalbe
> > size, in my test environment, it is about 4000. Then the small queue
> > check won't fail.
>
>
> >
> > Signed-off-by: Peng Yu <peng.yu@...baba-inc.com>
> > ---
> > net/ipv4/tcp_output.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index b94efb3050d2..8c70acf3a060 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2985,8 +2985,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> > unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
> > break;
> >
> > - if (tcp_small_queue_check(sk, skb, 0))
> > + if (tcp_small_queue_check(sk, skb, 0)) {
> > + is_cwnd_limited = true;
> > break;
> > + }
> >
> > /* Argh, we hit an empty skb(), presumably a thread
> > * is sleeping in sendmsg()/sk_stream_wait_memory().
> > --
> > 2.47.3
>
> Sorry this makes no sense to me. CWND_LIMITED should not be hijacked.
>
> Something else is preventing your flows to get to nominal speed,
> because we have not seen anything like that.
>
> It is probably a driver issue or a receive side issue : Instead of
> trying to work around the issue, please root cause it.
BTW we recently fixed a bug in tcp_tso_should_defer()
Make sure to try a kernel with this fix ?
295ce1eb36ae47dc862d6c8a1012618a25516208 tcp: fix
tcp_tso_should_defer() vs large RTT
Powered by blists - more mailing lists