[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1ac7a6e-4447-4476-8fb7-fb5f0d7ec979@arista.com>
Date: Thu, 1 Feb 2024 23:37:06 +0000
From: Dmitry Safonov <dima@...sta.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Shuah Khan <shuah@...nel.org>, Dmitry Safonov <0x7f454c46@...il.com>,
Mohammad Nassiri <mnassiri@...na.com>, Simon Horman <horms@...nel.org>,
netdev@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/3] selftests/net: A couple of typos fixes in
key-management/rst tests
On 2/1/24 22:25, Dmitry Safonov wrote:
> Hi Jakub,
>
> On 2/1/24 21:21, Jakub Kicinski wrote:
>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>
>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>
>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>> with debug options enabled :) (kernel/configs/debug.config and
>> kernel/configs/x86_debug.config included),
>
> Haha :)
>
>> that slows things down
>> and causes a bit of flakiness in unsigned-md5-* tests:
>>
>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>
>> This has links to outputs:
>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>
>> If it's a timing thing - FWIW we started exporting
>> KSFT_MACHINE_SLOW=yes on the slow runners.
>
> I think, I know what happens here:
>
> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
> => 6
> # ok 9 AO server (AO_REQUIRED): unsigned client
> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
> increased 1 => 2
> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
> netns_ao_good was not expected to increase 7 => 8
>
> for each of tests the server listens at a new port, but re-uses the same
> namespaces+veth. If the node/machine is quite slow, I guess a segment
> might have been retransmitted and the test that initiated it had already
> finished.
> And as result, the per-namespace counters are incremented, which makes
> the test fail (IOW, the test expects all segments in ns being dropped).
>
> So, I should do one of the options:
>
> 1. relax per-namespace checks (the per-socket and per-key counters are
> checked)
> 2. unshare(net) + veth setup for each test
> 3. split the selftest on smaller ones (as they create new net-ns in
> initialization)
Actually, I think there may be an easier fix:
4. Make sure that client close()s TCP-AO first, making it twsk.
And also make sure that net-ns counters read post server's close().
Will do this, let's see if this fixes the flakiness on the netdev bot :)
> I'd probably prefer (2), albeit it slows down that slow machine even
> more, but I don't think creating 2 net-ns + veth pair per each test
> would add a lot more overhead even on some rpi board. But let's see,
> maybe I'll just go with (1) as that's really easy.
>
> I'll cook a patch this week.
Thanks,
Dmitry
Powered by blists - more mailing lists