[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20260127-psp-flaky-test-v1-1-13403e390af3@gmail.com>
Date: Tue, 27 Jan 2026 08:30:55 -0800
From: Daniel Zahka <daniel.zahka@...il.com>
To: Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Shuah Khan <shuah@...nel.org>, Willem de Bruijn <willemb@...gle.com>
Cc: netdev@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-kernel@...r.kernel.org, Daniel Zahka <daniel.zahka@...il.com>
Subject: [PATCH net-next] selftests: drv-net: psp: fix test flakes from
racy connection close
There is a bug in assoc_sk_only_mismatch() and
assoc_sk_only_mismatch_tx() that creates a race condition which
triggers test flakes in later test cases e.g. data_send_bad_key().
The problem is that the client uses the "conn clr" rpc to setup a data
connection with psp_responder, but never uses a matching "data close"
rpc. This creates a race condition where if the client can queue
another data sock request, like in data_send_bad_key(), before the
server can accept the old connection from the backlog we end up in a
situation where we have two connections in the backlog: one for the
closed connection we have received a FIN for, and one for the new PSP
connection which is expecting to do key exchange.
>From there the server pops the closed connection from the backlog, but
the data_send_bad_key() test case in psp.py hangs waiting to perform
key exchange.
The fix is to properly use _conn_close, which fill force the server to
remove the closed connection from the backlog before sending the RPC
ack to the client.
Signed-off-by: Daniel Zahka <daniel.zahka@...il.com>
---
The data_send_bad_key() test case has been flaking in automated
testing. The root cause is actually some racy connection
setup/teardown logic between the client and server in the preceding
test cases.
I have detailed the exact circumstances for the test failure in the
commit. To reproduce the issue deterministically, I inserted a sleep
into the psp_responder.c conn clr handler
if (cmd("conn clr")) {
if (accept_cfg != ACCEPT_CFG_NONE)
fprintf(stderr, "WARN: old conn config still set!\n");
accept_cfg = ACCEPT_CFG_CLEAR;
send_ack(comm_sock);
+ sleep(1);
}
which produces the following error just running two tests:
1..2
ok 1 psp.assoc_sk_only_mismatch
# Exception| Traceback (most recent call last):
# Exception| File "/data/users/dzahka/psp-flaky-test/tools/testing/selftests/net/lib/py/ksft.py", line 319, in ksft_run
# Exception| func(*args)
# Exception| File "/data/users/dzahka/psp-flaky-test/./tools/testing/selftests/drivers/net/psp.py", line 420, in data_send_bad_key
# Exception| tx = _spi_xchg(s, rx)
# Exception| File "/data/users/dzahka/psp-flaky-test/./tools/testing/selftests/drivers/net/psp.py", line 65, in _spi_xchg
# Exception| tx = s.recv(4 + len(rx['key']))
# Exception| File "/data/users/dzahka/psp-flaky-test/tools/testing/selftests/net/lib/py/ksft.py", line 258, in _ksft_intr
# Exception| raise KsftTerminate()
# Exception| net.lib.py.ksft.KsftTerminate
# Stopping tests due to KsftTerminate.
not ok 2 psp.data_send_bad_key
# Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0
#
# Responder logs (-15):
# STDERR:
# Set PSP enable on device 3 to 0xf
# DEBUG: ...
# DEBUG: command: conn clr
# DEBUG: ...
# DEBUG: command: conn psp
# WARN: old conn config still set!
# DEBUG: new data sock: psp
# DEBUG: create PSP connection
# DEBUG: ...
# DEBUG: data sock closed
# DEBUG: ...
# WARN: new data sock but no config
# DEBUG: ...
# DEBUG: data read 20
# DEBUG: ...
Traceback (most recent call last):
The problem is caused by the conn clr and conn psp RPC handlers
running consecutively without the first connection being accepted and
closed by the server.
The fix is simply to match all conn clr commands with a data close
RPC. The forces the trace to be:
# Set PSP enable on device 3 to 0xf
# DEBUG: ...
# DEBUG: command: conn clr
# DEBUG: ...
# DEBUG: command: data close
# DEBUG: new data sock: clear
# DEBUG: ...
# DEBUG: command: conn psp
# DEBUG: ...
# DEBUG: new data sock: psp
# DEBUG: create PSP connection
So the closed connection from the conn clr is removed from the backlog
before sending the ack for data close to the client.
---
tools/testing/selftests/drivers/net/psp.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/psp.py b/tools/testing/selftests/drivers/net/psp.py
index 528a421ecf76..864d9fce1094 100755
--- a/tools/testing/selftests/drivers/net/psp.py
+++ b/tools/testing/selftests/drivers/net/psp.py
@@ -266,6 +266,7 @@ def assoc_sk_only_mismatch(cfg):
the_exception = cm.exception
ksft_eq(the_exception.nl_msg.extack['bad-attr'], ".dev-id")
ksft_eq(the_exception.nl_msg.error, -errno.EINVAL)
+ _close_conn(cfg, s)
def assoc_sk_only_mismatch_tx(cfg):
@@ -283,6 +284,7 @@ def assoc_sk_only_mismatch_tx(cfg):
the_exception = cm.exception
ksft_eq(the_exception.nl_msg.extack['bad-attr'], ".dev-id")
ksft_eq(the_exception.nl_msg.error, -errno.EINVAL)
+ _close_conn(cfg, s)
def assoc_sk_only_unconn(cfg):
---
base-commit: a8a6c8cc8796ac573fb3902803da28cfa374787c
change-id: 20260126-psp-flaky-test-ea613ea5386c
Best regards,
--
Daniel Zahka <daniel.zahka@...il.com>
Powered by blists - more mailing lists