[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210511231223.2895398-23-paulmck@kernel.org>
Date: Tue, 11 May 2021 16:12:20 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: rcu@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, kernel-team@...com, mingo@...nel.org,
jiangshanlai@...il.com, akpm@...ux-foundation.org,
mathieu.desnoyers@...icios.com, josh@...htriplett.org,
tglx@...utronix.de, peterz@...radead.org, rostedt@...dmis.org,
dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
oleg@...hat.com, joel@...lfernandes.org,
"Paul E. McKenney" <paulmck@...nel.org>
Subject: [PATCH tip/core/rcu 23/26] torture: Make kvm-remote.sh account for network failure in pathname checks
In a long-duration kvm-remote.sh run, almost all of the remote accesses will
be simple file-existence checks. These are thus the most likely to be caught
out by network failures, which do happen from time to time.
This commit therefore takes a first step towards tolerating temporary
network outages by making the file-existence checks repeat in the face of
such an outage. They also print a message every minute during a outage,
allowing the user to take appropriate action.
Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
---
.../selftests/rcutorture/bin/kvm-remote.sh | 26 +++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-remote.sh b/tools/testing/selftests/rcutorture/bin/kvm-remote.sh
index f08d415d4f99..20e848d2c0bb 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-remote.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-remote.sh
@@ -159,6 +159,28 @@ do
fi
done
+# Function to check for presence of a file on the specified system.
+# Complain if the system cannot be reached, and retry after a wait.
+# Currently just waits forever if a machine disappears.
+#
+# Usage: checkremotefile system pathname
+checkremotefile () {
+ local ret
+ local sleeptime=60
+
+ while :
+ do
+ ssh $1 "test -f \"$2\""
+ ret=$?
+ if test "$ret" -ne 255
+ then
+ return $ret
+ fi
+ echo " ---" ssh failure to $1 checking for file $2, retry after $sleeptime seconds. `date`
+ sleep $sleeptime
+ done
+}
+
# Function to start batches on idle remote $systems
#
# Usage: startbatches curbatch nbatches
@@ -178,7 +200,7 @@ startbatches () {
echo $((nbatches + 1))
return 0
fi
- if ssh "$i" "test -f \"$resdir/$ds/remote.run\"" 1>&2
+ if checkremotefile "$i" "$resdir/$ds/remote.run" 1>&2
then
continue # System still running last test, skip.
fi
@@ -216,7 +238,7 @@ echo All batches started. `date`
# Wait for all remaining scenarios to complete and collect results.
for i in $systems
do
- while ssh "$i" "test -f \"$resdir/$ds/remote.run\""
+ while checkremotefile "$i" "$resdir/$ds/remote.run"
do
sleep 30
done
--
2.31.1.189.g2e36527f23
Powered by blists - more mailing lists