netdev - Re: [PATCH net-next 0/4] selftests/net: packetdrill: import multiple tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6768dd1289ee2_3cff202943a@willemb.c.googlers.com.notmuch>
Date: Sun, 22 Dec 2024 22:46:26 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Paolo Abeni <pabeni@...hat.com>, 
 Soham Chakradeo <sohamch.kernel@...il.com>, 
 Willem de Bruijn <willemb@...gle.com>, 
 netdev@...r.kernel.org, 
 davem@...emloft.net, 
 edumazet@...gle.com, 
 linux-kselftest@...r.kernel.org, 
 Soham Chakradeo <sohamch@...gle.com>
Subject: Re: [PATCH net-next 0/4] selftests/net: packetdrill: import multiple
 tests

Jakub Kicinski wrote:
> On Thu, 19 Dec 2024 14:31:44 -0500 Willem de Bruijn wrote:
> > All three timestamping flakes are instances where the script expects
> > the timestamp to be taken essentially instantaneously after the send
> > call.
> > 
> > This is not the case, and the delay is outside even the 14K tolerance.
> > I see occurrences of 20K. At some point we cannot keep increasing the
> > tolerance, perhaps.
> 
> I pinned the other services away and gave the packetdrill tester its
> own cores. Let's see how much of a difference this makes.
> The net-next-2024-12-20--03-00 branch will be the first to have this.

Thanks. It does not seem to resolve the flakes.

At this point I think the best path is to run them in debug mode to
get coverage, but ignore errors. With the below draft patch, error
output is still logged. For instance:

# tcp_timestamping_partial.pkt:58: runtime error in recvmsg call: Bad timestamp 0 in scm_timestamping 0: expected=1734924748967958 (20000) actual=1734924748982069 (34111) start=1734924748947958
# ok 2 ipv6 # SKIP

Such timestamping test failures are fairly straightforward. We could
just increase the KSFT_MACHINE_SLOW timeout. But other tests see an
actual difference in TCP stack behavior, e.g., size of packet. That is
not addressed by a further relaxation of the tolerance.


+++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
@@ -23,7 +23,7 @@ if [ $# -ne 1 ]; then
        ktap_exit_fail_msg "usage: $0 <script>"
        exit "$KSFT_FAIL"
 fi
-script="$1"
+script="$(basename $1)"
 
 if [ -z "$(which packetdrill)" ]; then
        ktap_skip_all "packetdrill not found in PATH"
@@ -31,16 +31,27 @@ if [ -z "$(which packetdrill)" ]; then
 fi
 
 declare -a optargs
+failfunc=ktap_test_fail
+
 if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then
        optargs+=('--tolerance_usecs=14000')
+
+       declare -ar skip_list=(
+               "tcp_fast_recovery_prr-ss.*.pkt"
+               "tcp_timestamping.*.pkt"
+               "tcp_user_timeout_user-timeout-probe.pkt"
+               "tcp_zerocopy_epoll_.*.pkt"
+       )
+       readonly skip_pattern="^($(printf '%s|' "${skip_list[@]}"))$"
+       [[ "$script" =~ ${skip_pattern} ]] && failfunc=ktap_test_skip
 fi
 
 ktap_print_header
 ktap_set_plan 2
 
-unshare -n packetdrill ${ipv4_args[@]} ${optargs[@]} $(basename $script) > /dev/null \
-       && ktap_test_pass "ipv4" || ktap_test_fail "ipv4"
-unshare -n packetdrill ${ipv6_args[@]} ${optargs[@]} $(basename $script) > /dev/null \
-       && ktap_test_pass "ipv6" || ktap_test_fail "ipv6"
+unshare -n packetdrill ${ipv4_args[@]} ${optargs[@]} $script > /dev/null \
+       && ktap_test_pass "ipv4" || $failfunc "ipv4"
+unshare -n packetdrill ${ipv6_args[@]} ${optargs[@]} $script > /dev/null \
+       && ktap_test_pass "ipv6" || $failfunc "ipv6"