[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c82e38f-8253-3e41-a5f-dfbb261165ca@ewheeler.net>
Date: Wed, 25 Jun 2025 12:17:11 -0700 (PDT)
From: Eric Wheeler <netdev@...ts.ewheeler.net>
To: Neal Cardwell <ncardwell@...gle.com>
cc: netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
Geumhwan Yu <geumhwan.yu@...sung.com>, Jakub Kicinski <kuba@...nel.org>,
Sasha Levin <sashal@...nel.org>, Yuchung Cheng <ycheng@...gle.com>,
stable@...nel.org
Subject: Re: [BISECT] regression: tcp: fix to allow timestamp undo if no
retransmits were sent
On Wed, 18 Jun 2025, Eric Wheeler wrote:
> On Mon, 16 Jun 2025, Neal Cardwell wrote:
> > On Mon, Jun 16, 2025 at 4:14 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > On Sun, 15 Jun 2025, Eric Wheeler wrote:
> > > > On Tue, 10 Jun 2025, Neal Cardwell wrote:
> > > > > On Mon, Jun 9, 2025 at 1:45 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > On Sat, Jun 7, 2025 at 7:26 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > On Sat, Jun 7, 2025 at 6:54 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > On Sat, Jun 7, 2025 at 3:13 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > > > > > > > On Fri, Jun 6, 2025 at 6:34 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > > > > > > > On Fri, 6 Jun 2025, Neal Cardwell wrote:
> > > > > > > > > > > On Thu, Jun 5, 2025 at 9:33 PM Eric Wheeler <netdev@...ts.ewheeler.net> wrote:
> > > > > > > > > > > > After upgrading to Linux v6.6.85 on an older Supermicro SYS-2026T-6RFT+
> > > > > > > > > > > > with an Intel 82599ES 10GbE NIC (ixgbe) linked to a Netgear GS728TXS at
> > > > > > > > > > > > 10GbE via one SFP+ DAC (no bonding), we found TCP performance with
> > > > > > > > > > > > existing devices on 1Gbit ports was <60Mbit; however, TCP with devices
> > > > > > > > > > > > across the switch on 10Gbit ports runs at full 10GbE.
> > > > > > > > > > > >
> > > > > > > > > > > > Through bisection, we found this first-bad commit:
> > > > > > > > > > > >
> > > > > > > > > > > > tcp: fix to allow timestamp undo if no retransmits were sent
> > > > > > > > > > > > upstream: e37ab7373696e650d3b6262a5b882aadad69bb9e
> > > > > > > > > > > > stable 6.6.y: e676ca60ad2a6fdeb718b5e7a337a8fb1591d45f
> > > > >
> > > >
> > > > > The attached patch should apply (with "git am") for any recent kernel
> > > > > that has the "tcp: fix to allow timestamp undo if no retransmits were
> > > > > sent" patch it is fixing. So you should be able to test it on top of
> > > > > the 6.6 stable or 6.15 stable kernels you used earlier. Whichever is
> > > > > easier.
> > >
> > > Definitely better, but performance is ~15% slower vs reverting, and the
> > > retransmit counts are still higher than the other. In the two sections
> > > below you can see the difference between after the fix and after the
> > > revert.
> > >
> >
> > Would you have cycles to run the "after-fix" and "after-revert-6.6.93"
> > cases multiple times, so we can get a sense of what is signal and what
> > is noise? Perhaps 20 or 50 trials for each approach?
>
> I ran 50 tests after revert and compare that to after the fix using both
> average and geometric mean, and it still appears to be slightly slower
> then with the revert alone:
>
> # after-revert-6.6.93
> Arithmetic Mean: 843.64 Mbits/sec
> Geometric Mean: 841.95 Mbits/sec
>
> # after-tcp-fix-6.6.93
> Arithmetic Mean: 823.00 Mbits/sec
> Geometric Mean: 819.38 Mbits/sec
>
Re-sending this question in case this message got lost:
> Do you think that this is an actual performance regression, or just a
> sample set that is not big enough to work out the averages?
>
> Here is the data collected for each of the 50 tests:
> - https://www.linuxglobal.com/out/for-neal/after-revert-6.6.93.tar.gz
> - https://www.linuxglobal.com/out/for-neal/after-tcp-fix-6.6.93.tar.gz
--
Eric Wheeler
Powered by blists - more mailing lists