netdev - Re: [PATCH net-next] tcp: better handle TCP_TX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKr8P7_qesQ1LVKibuETCEaP6mNC-yjmymmRvtzLibzfA@mail.gmail.com>
Date: Tue, 14 Oct 2025 02:38:35 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>, 
	Willem de Bruijn <willemb@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows

On Tue, Oct 14, 2025 at 1:54 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 14, 2025 at 1:29 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, Oct 14, 2025 at 1:22 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > >
> > > On 10/13/25 4:59 PM, Eric Dumazet wrote:
> > > > Some applications uses TCP_TX_DELAY socket option after TCP flow
> > > > is established.
> > > >
> > > > Some metrics need to be updated, otherwise TCP might take time to
> > > > adapt to the new (emulated) RTT.
> > > >
> > > > This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> > > > and sk->sk_pacing_rate.
> > > >
> > > > This is best effort, and for instance icsk_rto is reset
> > > > without taking backoff into account.
> > > >
> > > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > >
> > > The CI is consistently reporting pktdrill failures on top of this patch:
> > >
> > > # selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
> > > # TAP version 13
> > > # 1..2
> > > # tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
> > > # Traceback (most recent call last):
> > > #   File "/tmp/code_T7S7S4", line 202, in <module>
> > > #     assert tcpi_probes == 6, tcpi_probes; \
> > > # AssertionError: 0
> > > # tcp_user_timeout_user-timeout-probe.pkt: error executing code:
> > > 'python3' returned non-zero status 1
> > >
> > > To be accurate, the patches batch under tests also includes:
> > >
> > > https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> > >
> > > but the latter looks even more unlikely to cause the reported issues?!?
>
> Not sure, look at the packetdrill test "`tc qdisc delete dev tun0 root
> 2>/dev/null ; tc qdisc add dev tun0 root pfifo limit 0`"
>
> After "net: dev_queue_xmit() llist adoption" __dev_xmit_skb() might
> return NET_XMIT_SUCCESS instead of NET_XMIT_DROP
>
> __tcp_transmit_skb() has some code to detect NET_XMIT_DROP
> immediately, instead of relying on a timer.
>
> I can fix the 'single packet' case, but not the case of many packets
> being sent in //
>
> Note this issue was there already, for qdisc with TCQ_F_CAN_BYPASS :
> We were returning NET_XMIT_SUCCESS even if the driver had to drop the packet.
>
> Test is flaky even without the
> https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> series.

Test flakiness can be fixed with

diff --git a/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
b/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
index 183051ba0cae..71f7a75a733b 100644
--- a/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
+++ b/tools/testing/selftests/net/packetdrill/tcp_user_timeout_user-timeout-probe.pkt
@@ -7,6 +7,8 @@
    +0 bind(3, ..., ...) = 0
    +0 listen(3, 1) = 0

+// install a pfifo qdisc
+   +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
tun0 root pfifo limit 10`

    +0 < S 0:0(0) win 0 <mss 1460>
    +0 > S. 0:0(0) ack 1 <mss 1460>
@@ -21,16 +23,18 @@
    +0 %{ assert tcpi_probes == 0, tcpi_probes; \
          assert tcpi_backoff == 0, tcpi_backoff }%

-// install a qdisc dropping all packets
-   +0 `tc qdisc delete dev tun0 root 2>/dev/null ; tc qdisc add dev
tun0 root pfifo limit 0`
+// Tune pfifo limit to 0. A single tc command is less disruptive in VM tests.
+   +0 `tc qdisc change dev tun0 root pfifo limit 0`
+
    +0 write(4, ..., 24) = 24
    // When qdisc is congested we retry every 500ms
    // (TCP_RESOURCE_PROBE_INTERVAL) and therefore
    // we retry 6 times before hitting 3s timeout.
    // First verify that the connection is alive:
-+3.250 write(4, ..., 24) = 24
++3 write(4, ..., 24) = 24
+
    // Now verify that shortly after that the socket is dead:
- +.100 write(4, ..., 24) = -1 ETIMEDOUT (Connection timed out)
++1 write(4, ..., 24) = -1 ETIMEDOUT (Connection timed out)

    +0 %{ assert tcpi_probes == 6, tcpi_probes; \
          assert tcpi_backoff == 0, tcpi_backoff }%