lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <294fe4ea-eb6c-3dc3-9c5-66f69514bc94@ewheeler.net>
Date: Wed, 25 Jun 2025 16:15:19 -0700 (PDT)
From: Eric Wheeler <netdev@...ts.ewheeler.net>
To: Neal Cardwell <ncardwell@...gle.com>
cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>, 
    Geumhwan Yu <geumhwan.yu@...sung.com>, Jakub Kicinski <kuba@...nel.org>, 
    Sasha Levin <sashal@...nel.org>, Yuchung Cheng <ycheng@...gle.com>, 
    stable@...nel.org
Subject: Re: [BISECT] regression: tcp: fix to allow timestamp undo if no
 retransmits were sent

On Wed, 25 Jun 2025, Neal Cardwell wrote:
> On Wed, Jun 25, 2025 at 3:17 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> >
> > On Wed, 18 Jun 2025, Eric Wheeler wrote:
> > > On Mon, 16 Jun 2025, Neal Cardwell wrote:
> > > > On Mon, Jun 16, 2025 at 4:14 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > > On Sun, 15 Jun 2025, Eric Wheeler wrote:
> > > > > > On Tue, 10 Jun 2025, Neal Cardwell wrote:
> > > > > > > On Mon, Jun 9, 2025 at 1:45 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > On Sat, Jun 7, 2025 at 7:26 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > > On Sat, Jun 7, 2025 at 6:54 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > > > On Sat, Jun 7, 2025 at 3:13 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > > > > On Fri, Jun 6, 2025 at 6:34 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > > > > > > > > > On Fri, 6 Jun 2025, Neal Cardwell wrote:
> > > > > > > > > > > > > On Thu, Jun 5, 2025 at 9:33 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > > > > > > > > > > > After upgrading to Linux v6.6.85 on an older Supermicro SYS-2026T-6RFT+
> > > > > > > > > > > > > > with an Intel 82599ES 10GbE NIC (ixgbe) linked to a Netgear GS728TXS at
> > > > > > > > > > > > > > 10GbE via one SFP+ DAC (no bonding), we found TCP performance with
> > > > > > > > > > > > > > existing devices on 1Gbit ports was <60Mbit; however, TCP with devices
> > > > > > > > > > > > > > across the switch on 10Gbit ports runs at full 10GbE.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Through bisection, we found this first-bad commit:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         tcp: fix to allow timestamp undo if no retransmits were sent
> > > > > > > > > > > > > >                 upstream:       e37ab7373696e650d3b6262a5b882aadad69bb9e
> > > > > > > > > > > > > >                 stable 6.6.y:   e676ca60ad2a6fdeb718b5e7a337a8fb1591d45f
> > > > > > >
> > > > > >
> > > > > > > The attached patch should apply (with "git am") for any recent kernel
> > > > > > > that has the "tcp: fix to allow timestamp undo if no retransmits were
> > > > > > > sent" patch it is fixing. So you should be able to test it on top of
> > > > > > > the 6.6 stable or 6.15 stable kernels you used earlier. Whichever is
> > > > > > > easier.
> > > > >
> > > > > Definitely better, but performance is ~15% slower vs reverting, and the
> > > > > retransmit counts are still higher than the other.  In the two sections
> > > > > below you can see the difference between after the fix and after the
> > > > > revert.
> > > > >
> > > >
> > > > Would you have cycles to run the "after-fix" and "after-revert-6.6.93"
> > > > cases multiple times, so we can get a sense of what is signal and what
> > > > is noise? Perhaps 20 or 50 trials for each approach?
> > >
> > > I ran 50 tests after revert and compare that to after the fix using both
> > > average and geometric mean, and it still appears to be slightly slower
> > > then with the revert alone:
> > >
> > >       # after-revert-6.6.93
> > >       Arithmetic Mean: 843.64 Mbits/sec
> > >       Geometric Mean: 841.95 Mbits/sec
> > >
> > >       # after-tcp-fix-6.6.93
> > >       Arithmetic Mean: 823.00 Mbits/sec
> > >       Geometric Mean: 819.38 Mbits/sec
> > >
> >
> > Re-sending this question in case this message got lost:
> >
> > > Do you think that this is an actual performance regression, or just a
> > > sample set that is not big enough to work out the averages?
> > >
> > > Here is the data collected for each of the 50 tests:
> > >       - https://www.linuxglobal.com/out/for-neal/after-revert-6.6.93.tar.gz
> > >       - https://www.linuxglobal.com/out/for-neal/after-tcp-fix-6.6.93.tar.gz
> 
> Hi Eric,
> 
> Many thanks for this great data!
> 
> I have been looking at this data. It's quite interesting.
> 
> Looking at the CDF of throughputs for the "revert" cases vs the "fix"
> cases (attached) it does look like for the 70-th percentile and below
> (the 70% of most unlucky cases), the "fix" cases have a throughput
> that is lower, and IMHO this looks outside the realm of what we would
> expect from noise.
> 
> However, when I look at the traces, I don't see any reason why the
> "fix" cases would be systematically slower. In particular, the "fix"
> and "revert" cases are only changing a function used for "undo"
> decisions, but for both the "fix" or "revert" cases, there are no
> "undo" events, and I don't see cases with spurious retransmissions
> where there should have been "undo" events and yet there were not.
> 
> Visually inspecting the traces, the dominant determinant of
> performance seems to be how many RTO events there were. For example,
> the worst case for the "fix" trials has 16 RTOs, whereas the worst
> case for the "revert" trials has 13 RTOs. And the number of RTO events
> per trial looks random; I see similar qualitative patterns between
> "fix" and "revert" cases, and don't see any reason why there are more
> RTOs in the "fix" cases than the "revert" cases. All the RTOs seem to
> be due to pre-existing (longstanding) performance problems in non-SACK
> loss recovery.
> 
> One way to proceed would be for me to offer some performance fixes for
> the RTOs, so we can get rid of the RTOs, which are the biggest source
> of performance variation. That should greatly reduce noise, and
> perhaps make it easier to see if there is any real difference between
> "fix" and "revert" cases.
> 
> We could compare the following two kernels, with another 50 tests for
> each of two kernels:
> 
> + (a) 6.6.93 + {2 patches to fix RTOs} + "revert"
> + (b) 6.6.93 + {2 patches to fix RTOs} + "fix"
> 
> where:
> 
> "revert" =  revert e37ab7373696 ("tcp: fix to allow timestamp undo if
> no retransmits were sent")
> "fix" = apply d0fa59897e04 ("tcp: fix tcp_packet_delayed() for
> tcp_is_non_sack_preventing_reopen() behavior"
> 
> This would have the side benefit of testing some performance
> improvements for non-SACK connections.
> 
> Are you up for that? :-)


Sure, if you have some patch ideas in mind, I'm all for getting patches 
merged improve performance.  

BTW, what causes a non-SACK connection?  The RX side is a near-idle Linux 
6.8 host default sysctl settings.


--
Eric Wheeler


> 
> Best regards,
> neal
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ