[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLsDDQuuQF2i73_-HaHMUwd80Q_ePcoQRy_8GxY2N4eMQ@mail.gmail.com>
Date: Sun, 19 Oct 2025 10:43:28 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Peng Yu <yupeng0921@...il.com>
Cc: ncardwell@...gle.com, kuniyu@...gle.com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Peng Yu <peng.yu@...baba-inc.com>
Subject: Re: [PATCH] net: set is_cwnd_limited when the small queue check fails
On Sun, Oct 19, 2025 at 10:00 AM Peng Yu <yupeng0921@...il.com> wrote:
>
> The limit of the small queue check is calculated from the pacing rate,
> the pacing rate is calculated from the cwnd. If the cwnd is small,
> the small queue check may fail.
> When the samll queue check fails, the tcp layer will send less
> packages, then the tcp_is_cwnd_limited would alreays return false,
> then the cwnd would have no chance to get updated.
> The cwnd has no chance to get updated, it keeps small, then the pacing
> rate keeps small, and the limit of the small queue check keeps small,
> then the small queue check would always fail.
> It is a kind of dead lock, when a tcp flow comes into this situation,
> it's throughput would be very small, obviously less then the correct
> throughput it should have.
> We set is_cwnd_limited to true when the small queue check fails, then
> the cwnd would have a chance to get updated, then we can break this
> deadlock.
>
> Below ss output shows this issue:
>
> skmem:(r0,rb131072,
> t7712, <------------------------------ wmem_alloc = 7712
> tb243712,f2128,w219056,o0,bl0,d0)
> ts sack cubic wscale:7,10 rto:224 rtt:23.364/0.019 ato:40 mss:1448
> pmtu:8500 rcvmss:536 advmss:8448
> cwnd:28 <------------------------------ cwnd=28
> bytes_sent:2166208 bytes_acked:2148832 bytes_received:37
> segs_out:1497 segs_in:751 data_segs_out:1496 data_segs_in:1
> send 13882554bps lastsnd:7 lastrcv:2992 lastack:7
> pacing_rate 27764216bps <--------------------- pacing_rate=27764216bps
> delivery_rate 5786688bps delivered:1485 busy:2991ms unacked:12
> rcv_space:57088 rcv_ssthresh:57088 notsent:188240
> minrtt:23.319 snd_wnd:57088
>
> limit=(27764216 / 8) / 1024 = 3389 < 7712
> So the samll queue check fails. When it happens, the throughput is
> obviously less than the normal situation.
>
> By setting the tcp_is_cwnd_limited to true when the small queue check
> failed, we can avoid this issue, the cwnd could increase to a reasonalbe
> size, in my test environment, it is about 4000. Then the small queue
> check won't fail.
>
> Signed-off-by: Peng Yu <peng.yu@...baba-inc.com>
> ---
> net/ipv4/tcp_output.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b94efb3050d2..8c70acf3a060 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2985,8 +2985,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
> break;
>
> - if (tcp_small_queue_check(sk, skb, 0))
> + if (tcp_small_queue_check(sk, skb, 0)) {
> + is_cwnd_limited = true;
> break;
> + }
>
> /* Argh, we hit an empty skb(), presumably a thread
> * is sleeping in sendmsg()/sk_stream_wait_memory().
> --
> 2.47.3
Sorry this makes no sense to me. CWND_LIMITED should not be hijacked.
Something else is preventing your flows to get to nominal speed,
because we have not seen anything like that.
It is probably a driver issue or a receive side issue : Instead of
trying to work around the issue, please root cause it.
Powered by blists - more mailing lists