linux-kernel - Re: [PATCH net 4/7] selftests/net: Add mixed select()+polling mode to TCP-AO tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a360fcbc-19e5-4ee6-9b80-2621fefd9ad6@redhat.com>
Date: Tue, 18 Mar 2025 15:47:13 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Dmitry Safonov <0x7f454c46@...il.com>
Cc: netdev@...r.kernel.org, linux-kselftest@...r.kernel.org,
 linux-kernel@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Simon Horman <horms@...nel.org>, Shuah Khan <shuah@...nel.org>
Subject: Re: [PATCH net 4/7] selftests/net: Add mixed select()+polling mode to
 TCP-AO tests

On 3/12/25 10:10 AM, Dmitry Safonov wrote:
> Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and
> TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one,
> TEST_RETRANSMIT_SEC is used for operations that are expected to succeed
> in order for a test to pass. It is usually not consumed and exists only
> to avoid indefinite test run if the operation didn't complete.
> The second one, TEST_RETRANSMIT_SEC exists for the tests that checking
> operations, that are expected to fail/timeout. It is shorter as it is
> fully consumed, with an expectation that if operation didn't succeed
> during that period, it will timeout. And the related test that expects
> the timeout is passing. The actual operation failure is then
> cross-verified by other means like counters checks.
> 
> The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact
> initial TCP timeout. So, in case the initial segment gets lost (quite
> unlikely on local veth interface between two net namespaces, yet happens
> in slow VMs), the retransmission never happens and as a result, the test
> is not actually testing the functionality. Which in the end fails
> counters checks.
> 
> As I want tcp_ao selftests to be fast and finishing in a reasonable
> amount of time on manual run, I didn't consider increasing
> TEST_RETRANSMIT_SEC.
> 
> Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever
> to make the initial TCP timeout shorter. But as it's not a socket bpf
> attached thing, but sock_ops (attaches to cgroups), the selftests would
> have to use libbpf, which I wanted to avoid if not absolutely required.
> 
> Instead, use a mixed select() and counters polling mode with the longer
> TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It
> actually not only allows losing segments and succeeding after
> the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes
> the tests expecting timeout/failure pass faster.
> 
> The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny
> "wrong snd id", which checks for no key on SYN-ACK for which there is no
> counter in the kernel (see tcp_make_synack()). Yet it can be speed up
> by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).
> 
> Reported-by: Jakub Kicinski <kuba@...nel.org>
> Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/
> Signed-off-by: Dmitry Safonov <0x7f454c46@...il.com>

Could you please provide a suitable Fixes tag here?

Also given a good slices of the patches here are refactor, I think the
whole series could land on net-next - so that we avoid putting a bit of
stuff in the last 6.14-net PR - WDYT?

Thanks,

Paolo