[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKr8P7_qesQ1LVKibuETCEaP6mNC-yjmymmRvtzLibzfA@mail.gmail.com>
Date: Tue, 14 Oct 2025 02:38:35 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>,
Willem de Bruijn <willemb@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org,
eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
On Tue, Oct 14, 2025 at 1:54 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 14, 2025 at 1:29 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, Oct 14, 2025 at 1:22 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > >
> > > On 10/13/25 4:59 PM, Eric Dumazet wrote:
> > > > Some applications uses TCP_TX_DELAY socket option after TCP flow
> > > > is established.
> > > >
> > > > Some metrics need to be updated, otherwise TCP might take time to
> > > > adapt to the new (emulated) RTT.
> > > >
> > > > This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> > > > and sk->sk_pacing_rate.
> > > >
> > > > This is best effort, and for instance icsk_rto is reset
> > > > without taking backoff into account.
> > > >
> > > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > >
> > > The CI is consistently reporting pktdrill failures on top of this patch:
> > >
> > > # selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
> > > # TAP version 13
> > > # 1..2
> > > # tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
> > > # Traceback (most recent call last):
> > > # File "/tmp/code_T7S7S4", line 202, in <module>
> > > # assert tcpi_probes == 6, tcpi_probes; \
> > > # AssertionError: 0
> > > # tcp_user_timeout_user-timeout-probe.pkt: error executing code:
> > > 'python3' returned non-zero status 1
> > >
> > > To be accurate, the patches batch under tests also includes:
> > >
> > > https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> > >
> > > but the latter looks even more unlikely to cause the reported issues?!?
>
> Not sure, look at the packetdrill test "`tc qdisc delete dev tun0 root
> 2>/dev/null ; tc qdisc add dev tun0 root pfifo limit 0`"
>
> After "net: dev_queue_xmit() llist adoption" __dev_xmit_skb() might
> return NET_XMIT_SUCCESS instead of NET_XMIT_DROP
>
> __tcp_transmit_skb() has some code to detect NET_XMIT_DROP
> immediately, instead of relying on a timer.
>
> I can fix the 'single packet' case, but not the case of many packets
> being sent in //
>
> Note this issue was there already, for qdisc with TCQ_F_CAN_BYPASS :
> We were returning NET_XMIT_SUCCESS even if the driver had to drop the packet.
>
> Test is flaky even without the
> https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> series.
Test flakiness can be fixed with
diff --git a/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
b/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
index 183051ba0cae..71f7a75a733b 100644
--- a/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
+++ b/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
@@ -7,6 +7,8 @@
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+// install a pfifo qdisc
+ +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
tun0 root pfifo limit 10`
+0 < S 0:0(0) win 0 <mss 1460>
+0 > S. 0:0(0) ack 1 <mss 1460>
@@ -21,16 +23,18 @@
+0 %{ assert tcpi_probes == 0, tcpi_probes; \
assert tcpi_backoff == 0, tcpi_backoff }%
-// install a qdisc dropping all packets
- +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
tun0 root pfifo limit 0`
+// Tune pfifo limit to 0. A single tc command is less disruptive in VM tests.
+ +0 `tc qdisc change dev tun0 root pfifo limit 0`
+
+0 write(4, ..., 24) = 24
// When qdisc is congested we retry every 500ms
// (TCP_RESOURCE_PROBE_INTERVAL) and therefore
// we retry 6 times before hitting 3s timeout.
// First verify that the connection is alive:
-+3.250 write(4, ..., 24) = 24
++3 write(4, ..., 24) = 24
+
// Now verify that shortly after that the socket is dead:
- +.100 write(4, ..., 24) = -1 ETIMEDOUT (Connection timed out)
++1 write(4, ..., 24) = -1 ETIMEDOUT (Connection timed out)
+0 %{ assert tcpi_probes == 6, tcpi_probes; \
assert tcpi_backoff == 0, tcpi_backoff }%
Powered by blists - more mailing lists