[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iL8YKZZQZSmg5WqrYVtyd2PanNXzTZ2Z0cObpv9_XSmoQ@mail.gmail.com>
Date: Tue, 14 Oct 2025 01:54:24 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>, Neal Cardwell <ncardwell@...gle.com>,
Willem de Bruijn <willemb@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org,
eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: better handle TCP_TX_DELAY on established flows
On Tue, Oct 14, 2025 at 1:29 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 14, 2025 at 1:22 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On 10/13/25 4:59 PM, Eric Dumazet wrote:
> > > Some applications uses TCP_TX_DELAY socket option after TCP flow
> > > is established.
> > >
> > > Some metrics need to be updated, otherwise TCP might take time to
> > > adapt to the new (emulated) RTT.
> > >
> > > This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto
> > > and sk->sk_pacing_rate.
> > >
> > > This is best effort, and for instance icsk_rto is reset
> > > without taking backoff into account.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> >
> > The CI is consistently reporting pktdrill failures on top of this patch:
> >
> > # selftests: net/packetdrill: tcp_user_timeout_user-timeout-probe.pkt
> > # TAP version 13
> > # 1..2
> > # tcp_user_timeout_user-timeout-probe.pkt:35: error in Python code
> > # Traceback (most recent call last):
> > # File "/tmp/code_T7S7S4", line 202, in <module>
> > # assert tcpi_probes == 6, tcpi_probes; \
> > # AssertionError: 0
> > # tcp_user_timeout_user-timeout-probe.pkt: error executing code:
> > 'python3' returned non-zero status 1
> >
> > To be accurate, the patches batch under tests also includes:
> >
> > https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
> >
> > but the latter looks even more unlikely to cause the reported issues?!?
Not sure, look at the packetdrill test "`tc qdisc delete dev tun0 root
2>/dev/null ; tc qdisc add dev tun0 root pfifo limit 0`"
After "net: dev_queue_xmit() llist adoption" __dev_xmit_skb() might
return NET_XMIT_SUCCESS instead of NET_XMIT_DROP
__tcp_transmit_skb() has some code to detect NET_XMIT_DROP
immediately, instead of relying on a timer.
I can fix the 'single packet' case, but not the case of many packets
being sent in //
Note this issue was there already, for qdisc with TCQ_F_CAN_BYPASS :
We were returning NET_XMIT_SUCCESS even if the driver had to drop the packet.
Test is flaky even without the
https://patchwork.kernel.org/project/netdevbpf/list/?series=1010780
series.
Powered by blists - more mailing lists