lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADVnQy=+j339MteN3+aGqACngWi4Z7TMr+qsbcXF8Te7gDR9Dw@mail.gmail.com>
Date: Thu, 28 Aug 2025 16:51:50 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "Ahmed, Shehab Sarar" <shehaba2@...inois.edu>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "kuniyu@...gle.com" <kuniyu@...gle.com>
Subject: Re: [BUG] TCP: Duplicate ACK storm after reordering with delayed
 packet (BBR RTO triggered)

On Wed, Aug 27, 2025 at 11:16 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Aug 27, 2025 at 6:12 PM Ahmed, Shehab Sarar
> <shehaba2@...inois.edu> wrote:
> >
> > Hello,
> >
> > I am a PhD student doing research on adversarial testing of different TCP protocols. Recently, I found an interesting behavior of TCP that I am describing below:
> >
> > The network RTT was high for about a second before it was abruptly reduced. Some packets sent during the high RTT phase experienced long delays in reaching the destination, while later packets, benefiting from the lower RTT, arrived earlier. This out-of-order arrival triggered the receiver to generate duplicate acknowledgments (dup ACKs). Due to the low RTT, these dup ACKs quickly reached the sender. Upon receiving three dup ACKs, the sender initiated a fast retransmission for an earlier packet that was not lost but was simply taking longer to arrive. Interestingly, despite the fast-retransmitted packet experienced a lower RTT, the original delayed packet still arrived first. When the receiver received this packet, it sent an ACK for the next packet in sequence. However, upon later receiving the fast-retransmitted packet, an issue arose in its logic for updating the acknowledgment number. As a result, even after the next expected packet was received, the acknowledgment number was not updated correctly. The receiver continued sending dup ACKs, ultimately forcing the congestion control protocol into the retransmission timeout (RTO) phase.
> >
> > I experienced this behavior in linux kernel 5.4.230 version and was wondering if the same issue persists in the recent-most kernel. Do you know of any commit that addressed this issue? If not, I am highly enthusiastic to investigate further. My suspicion is that the problem lies in tcp_input.c. I will be eagerly waiting for your reply.
>
> I really wonder why anyone would do any research on v5.4.230, a more
> than 2 years old kernel, clearly unsupported.
>
> I suggest you write a packetdrill test to exhibit the issue, then run
> a reverse bisection to find the commit fixing it (assuming recent
> kernels are fixed).
>
> There are about 8200 patches between v5.4.230 and v5.4.296, a
> bisection should be fast.

Thanks for your report, Shehab.

I agree with Eric's suggestion to try writing a packetdrill test case
for this, so we have a reproducer for the behavior, and if there is a
bug we can create a regression test for Linux TCP with that.

Shehab, while you are working on a packetdrill reproducer of this
case, if you can share a binary tcpdump .pcap trace of such a
scenario, that would be very useful. From your detailed description it
sounds like you have such a trace. If you can share it, that would be
great. A visualization with tcptrace or similar tools may be easier
for us to parse than this English prose description. ;-)

best regards,
neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ