[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88a51699-e913-4dba-992d-e923509ec754@kernel.org>
Date: Mon, 4 Aug 2025 11:58:54 +0200
From: Matthieu Baerts <matttbe@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>, willemb@...gle.com
Cc: netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
andrew+netdev@...n.ch, horms@...nel.org, shuah@...nel.org,
linux-kselftest@...r.kernel.org, davem@...emloft.net
Subject: Re: [PATCH net] selftests: net: packetdrill: xfail all problems on
slow machines
Hi Jakub, Willem,
On 01/08/2025 20:16, Jakub Kicinski wrote:
> We keep seeing flakes on packetdrill on debug kernels, while
> non-debug kernels are stable, not a single flake in 200 runs.
> Time to give up, debug kernels appear to suffer from 10msec
> latency spikes and any timing-sensitive test is bound to flake.
Thank you for the patch!
Another solution might be to increase the tolerance, but I don't think
it will fix all issues. I quickly looked at the last 100 runs, and I
think most failures might be fixed by a higher tolerance, e.g.
> # tcp_ooo-before-and-after-accept.pkt:19: timing error: expected inbound packet at 0.101619 sec but happened at 0.115894 sec; tolerance 0.014000 sec
(0.275ms above the limit!)
On MPTCP, we used to have a very high tolerance with debug kernels
(>0.5s) when public CIs were very limited in terms of CPU resources. I
guess having a tolerance of 0.1s would be enough, but for these MPTCP
packetdrill tests, I put 0.2s for the tolerance with a debug kernel,
just to be on the safe side.
Still, I think increasing the tolerance would not fix all issues. On
MPTCP side, the latency introduced by debug kernel caused unexpected
retransmissions due to too low RTO. I took the time to make sure
injected packets were always done with enough delay, but with the TCP
packetdrill tests here, it is possibly not enough to do that when I look
at some recent errors, e.g.
> tcp_zerocopy_batch.pkt:26: error handling packet: live packet payload: expected 4000 bytes vs actual 5000 bytes
At the end, and as previously mentioned, these adaptations for debug
kernel are perhaps not worth it: in this environment, it is probably
enough to ignore packetdrill results and focus on kernel warnings.
Acked-by: Matthieu Baerts (NGI0) <matttbe@...nel.org>
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Powered by blists - more mailing lists