[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ed114mbw.fsf@nvidia.com>
Date: Fri, 17 Jan 2025 13:49:23 +0100
From: Petr Machata <petrm@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: <davem@...emloft.net>, <netdev@...r.kernel.org>, <edumazet@...gle.com>,
<pabeni@...hat.com>, <andrew+netdev@...n.ch>, <horms@...nel.org>,
<shuah@...nel.org>, <linux-kselftest@...r.kernel.org>,
<willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH net-next] selftests: net: give up on the cmsg_time
accuracy on slow machines
Jakub Kicinski <kuba@...nel.org> writes:
> Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative
> cmsg_time.sh test") widened the accepted value range 8x but we still
> see flakes (at a rate of around 7%).
>
> Return XFAIL for the most timing sensitive test on slow machines.
>
> Before:
>
> # ./cmsg_time.sh
> Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK'
> FAIL - 1/36 cases failed
>
> After:
>
> # ./cmsg_time.sh
> Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL)
> Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL)
> OK
>
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> CC: shuah@...nel.org
> CC: linux-kselftest@...r.kernel.org
> CC: willemdebruijn.kernel@...il.com
> ---
> tools/testing/selftests/net/cmsg_time.sh | 35 +++++++++++++++++-------
> 1 file changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/tools/testing/selftests/net/cmsg_time.sh b/tools/testing/selftests/net/cmsg_time.sh
> index 1d7e756644bc..478af0aefa97 100755
> --- a/tools/testing/selftests/net/cmsg_time.sh
> +++ b/tools/testing/selftests/net/cmsg_time.sh
> @@ -34,13 +34,28 @@ BAD=0
> TOTAL=0
>
> check_result() {
> + local ret=$1
> + local got=$2
> + local exp=$3
> + local case=$4
> + local xfail=$5
> + local xf=
> + local inc=
> +
> + if [ "$xfail" == "xfail" ]; then
> + xf="(XFAIL)"
> + inc=0
> + else
> + inc=1
> + fi
> +
> ((TOTAL++))
> - if [ $1 -ne 0 ]; then
> - echo " Case $4 returned $1, expected 0"
> - ((BAD++))
> + if [ $ret -ne 0 ]; then
> + echo " Case $case returned $ret, expected 0 $xf"
> + ((BAD+=inc))
> elif [ "$2" != "$3" ]; then
> - echo " Case $4 returned '$2', expected '$3'"
> - ((BAD++))
> + echo " Case $case returned '$got', expected '$exp' $xf"
> + ((BAD+=inc))
> fi
> }
>
> @@ -66,14 +81,14 @@ for i in "-4 $TGT4" "-6 $TGT6"; do
> awk '/SND/ { if ($3 > 1000) print "OK"; }')
> check_result $? "$ts" "OK" "$prot - TXTIME abs"
>
> - [ "$KSFT_MACHINE_SLOW" = yes ] && delay=8000 || delay=1000
> + [ "$KSFT_MACHINE_SLOW" = yes ] && xfail=xfail
>
> - ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d $delay |
> + ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d 1000 |
> awk '/SND/ {snd=$3}
> /SCHED/ {sch=$3}
> - END { if (snd - sch > '$((delay/2))') print "OK";
> - else print snd, "-", sch, "<", '$((delay/2))'; }')
> - check_result $? "$ts" "OK" "$prot - TXTIME rel"
> + END { if (snd - sch > 500) print "OK";
> + else print snd, "-", sch, "<", 500; }')
> + check_result $? "$ts" "OK" "$prot - TXTIME rel" $xfail
> done
> done
This logging and xfail handling duplicates lib.sh. Would a patch like
below be OK with you? The gist of it is to just use check_err, log_test
and xfail_on_slow from lib.sh to achieve what the test open-codes.
I can send it as a follow-up instead if you prefer.
--- a/tools/testing/selftests/net/cmsg_time.sh
+++ b/tools/testing/selftests/net/cmsg_time.sh
@@ -29,19 +29,21 @@ ip -netns $NS addr add $IP6 dev dummy0
# Need FQ for TXTIME
ip netns exec $NS tc qdisc replace dev dummy0 root fq
-# Test
-BAD=0
-TOTAL=0
-
check_result() {
- ((TOTAL++))
- if [ $1 -ne 0 ]; then
- echo " Case $4 returned $1, expected 0"
- ((BAD++))
- elif [ "$2" != "$3" ]; then
- echo " Case $4 returned '$2', expected '$3'"
- ((BAD++))
- fi
+ local ret=$1
+ local got=$2
+ local exp=$3
+ local case=$4
+
+ RET=0
+
+ [ $1 -eq 0 ]
+ check_err $? "Case $4 returned $1, expected 0"
+
+ [ "$2" == "$3" ]
+ check_err $? "Case $4 returned '$2', expected '$3'"
+
+ log_test "$4"
}
for i in "-4 $TGT4" "-6 $TGT6"; do
@@ -73,15 +79,8 @@ for i in "-4 $TGT4" "-6 $TGT6"; do
/SCHED/ {sch=$3}
END { if (snd - sch > '$((delay/2))') print "OK";
else print snd, "-", sch, "<", '$((delay/2))'; }')
- check_result $? "$ts" "OK" "$prot - TXTIME rel"
+ xfail_on_slow check_result $? "$ts" "OK" "$prot - TXTIME rel"
done
done
-# Summary
-if [ $BAD -ne 0 ]; then
- echo "FAIL - $BAD/$TOTAL cases failed"
- exit 1
-else
- echo "OK"
- exit 0
-fi
+exit $EXIT_STATUS
This is much more verbose, but that's how tests tend to be. I could
change it to only log on RET != 0, but like this the custom results
block can go away and the test is overall more median.
bash-5.2# KSFT_MACHINE_SLOW=yes ./cmsg_time.sh
TEST: UDPv4 - no options [ OK ]
TEST: UDPv4 - ts cnt [ OK ]
TEST: UDPv4 - ts0 SCHED [ OK ]
TEST: UDPv4 - ts0 SND [ OK ]
TEST: UDPv4 - TXTIME abs [ OK ]
TEST: UDPv4 - TXTIME rel [XFAIL]
Case UDPv4 - TXTIME rel returned '34us - 30us < 4000', expected 'OK'
TEST: ICMPv4 - no options [ OK ]
TEST: ICMPv4 - ts cnt [ OK ]
[... snip ...]
bash-5.2# echo $?
0
Powered by blists - more mailing lists