lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQy=mrWeWWTV9YpTaH7G9QvW-qOd_VH5B4=vTxR6rZKwe4A@mail.gmail.com>
Date: Wed, 25 Jun 2025 16:19:01 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Wheeler <netdev@...ts.ewheeler.net>
Cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>, 
	Geumhwan Yu <geumhwan.yu@...sung.com>, Jakub Kicinski <kuba@...nel.org>, 
	Sasha Levin <sashal@...nel.org>, Yuchung Cheng <ycheng@...gle.com>, stable@...nel.org
Subject: Re: [BISECT] regression: tcp: fix to allow timestamp undo if no
 retransmits were sent

On Wed, Jun 25, 2025 at 3:17 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
>
> On Wed, 18 Jun 2025, Eric Wheeler wrote:
> > On Mon, 16 Jun 2025, Neal Cardwell wrote:
> > > On Mon, Jun 16, 2025 at 4:14 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > On Sun, 15 Jun 2025, Eric Wheeler wrote:
> > > > > On Tue, 10 Jun 2025, Neal Cardwell wrote:
> > > > > > On Mon, Jun 9, 2025 at 1:45 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > On Sat, Jun 7, 2025 at 7:26 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > On Sat, Jun 7, 2025 at 6:54 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > > On Sat, Jun 7, 2025 at 3:13 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > > > On Fri, Jun 6, 2025 at 6:34 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > > > > > > > > On Fri, 6 Jun 2025, Neal Cardwell wrote:
> > > > > > > > > > > > On Thu, Jun 5, 2025 at 9:33 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > > > > > > > > > > After upgrading to Linux v6.6.85 on an older Supermicro SYS-2026T-6RFT+
> > > > > > > > > > > > > with an Intel 82599ES 10GbE NIC (ixgbe) linked to a Netgear GS728TXS at
> > > > > > > > > > > > > 10GbE via one SFP+ DAC (no bonding), we found TCP performance with
> > > > > > > > > > > > > existing devices on 1Gbit ports was <60Mbit; however, TCP with devices
> > > > > > > > > > > > > across the switch on 10Gbit ports runs at full 10GbE.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Through bisection, we found this first-bad commit:
> > > > > > > > > > > > >
> > > > > > > > > > > > >         tcp: fix to allow timestamp undo if no retransmits were sent
> > > > > > > > > > > > >                 upstream:       e37ab7373696e650d3b6262a5b882aadad69bb9e
> > > > > > > > > > > > >                 stable 6.6.y:   e676ca60ad2a6fdeb718b5e7a337a8fb1591d45f
> > > > > >
> > > > >
> > > > > > The attached patch should apply (with "git am") for any recent kernel
> > > > > > that has the "tcp: fix to allow timestamp undo if no retransmits were
> > > > > > sent" patch it is fixing. So you should be able to test it on top of
> > > > > > the 6.6 stable or 6.15 stable kernels you used earlier. Whichever is
> > > > > > easier.
> > > >
> > > > Definitely better, but performance is ~15% slower vs reverting, and the
> > > > retransmit counts are still higher than the other.  In the two sections
> > > > below you can see the difference between after the fix and after the
> > > > revert.
> > > >
> > >
> > > Would you have cycles to run the "after-fix" and "after-revert-6.6.93"
> > > cases multiple times, so we can get a sense of what is signal and what
> > > is noise? Perhaps 20 or 50 trials for each approach?
> >
> > I ran 50 tests after revert and compare that to after the fix using both
> > average and geometric mean, and it still appears to be slightly slower
> > then with the revert alone:
> >
> >       # after-revert-6.6.93
> >       Arithmetic Mean: 843.64 Mbits/sec
> >       Geometric Mean: 841.95 Mbits/sec
> >
> >       # after-tcp-fix-6.6.93
> >       Arithmetic Mean: 823.00 Mbits/sec
> >       Geometric Mean: 819.38 Mbits/sec
> >
>
> Re-sending this question in case this message got lost:
>
> > Do you think that this is an actual performance regression, or just a
> > sample set that is not big enough to work out the averages?
> >
> > Here is the data collected for each of the 50 tests:
> >       - https://www.linuxglobal.com/out/for-neal/after-revert-6.6.93.tar.gz
> >       - https://www.linuxglobal.com/out/for-neal/after-tcp-fix-6.6.93.tar.gz

Hi Eric,

Many thanks for this great data!

I have been looking at this data. It's quite interesting.

Looking at the CDF of throughputs for the "revert" cases vs the "fix"
cases (attached) it does look like for the 70-th percentile and below
(the 70% of most unlucky cases), the "fix" cases have a throughput
that is lower, and IMHO this looks outside the realm of what we would
expect from noise.

However, when I look at the traces, I don't see any reason why the
"fix" cases would be systematically slower. In particular, the "fix"
and "revert" cases are only changing a function used for "undo"
decisions, but for both the "fix" or "revert" cases, there are no
"undo" events, and I don't see cases with spurious retransmissions
where there should have been "undo" events and yet there were not.

Visually inspecting the traces, the dominant determinant of
performance seems to be how many RTO events there were. For example,
the worst case for the "fix" trials has 16 RTOs, whereas the worst
case for the "revert" trials has 13 RTOs. And the number of RTO events
per trial looks random; I see similar qualitative patterns between
"fix" and "revert" cases, and don't see any reason why there are more
RTOs in the "fix" cases than the "revert" cases. All the RTOs seem to
be due to pre-existing (longstanding) performance problems in non-SACK
loss recovery.

One way to proceed would be for me to offer some performance fixes for
the RTOs, so we can get rid of the RTOs, which are the biggest source
of performance variation. That should greatly reduce noise, and
perhaps make it easier to see if there is any real difference between
"fix" and "revert" cases.

We could compare the following two kernels, with another 50 tests for
each of two kernels:

+ (a) 6.6.93 + {2 patches to fix RTOs} + "revert"
+ (b) 6.6.93 + {2 patches to fix RTOs} + "fix"

where:

"revert" =  revert e37ab7373696 ("tcp: fix to allow timestamp undo if
no retransmits were sent")
"fix" = apply d0fa59897e04 ("tcp: fix tcp_packet_delayed() for
tcp_is_non_sack_preventing_reopen() behavior"

This would have the side benefit of testing some performance
improvements for non-SACK connections.

Are you up for that? :-)

Best regards,
neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ