[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1694191.1726348322@warthog.procyon.org.uk>
Date: Sat, 14 Sep 2024 22:12:02 +0100
From: David Howells <dhowells@...hat.com>
To: Marc Dionne <marc.dionne@...istor.com>
Cc: dhowells@...hat.com, linux-afs@...ts.infradead.org,
Markus Suvanto <markus.suvanto@...il.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] afs: Fix possible infinite loop with unresponsive servers
From: Marc Dionne <marc.dionne@...istor.com>
afs: Fix possible infinite loop with unresponsive servers
A return code of 0 from afs_wait_for_one_fs_probe is an indication
that the endpoint state attached to the operation is stale and has
been superseded. In that case the iteration needs to be restarted
so that the newer probe result state gets used.
Failure to do so can result in an tight infinite loop around the
iterate_address label, where all addresses are thought to be responsive
and have been tried, with nothing to refresh the enpoint state.
[DH: Changed the priority of the returns from afs_wait_for_one_fs_probe(),
Made the first caller iterate the address if 1 is returned, refetch the
database records and begin again from the beginning if 0 is returned and
otherwise deal with an error. Altered the second caller to also handle the
"1" return.
Fixes: 495f2ae9e355 ("afs: Fix fileserver rotation")
Reported-by: Markus Suvanto <markus.suvanto@...il.com>
Link: https://lists.infradead.org/pipermail/linux-afs/2024-July/008628.html
cc: linux-afs@...ts.infradead.org
Signed-off-by: Marc Dionne <marc.dionne@...istor.com>
Signed-off-by: David Howells <dhowells@...hat.com>
Link: https://lore.kernel.org/r/20240906134019.131553-1-marc.dionne@auristor.com/
---
fs/afs/fs_probe.c | 4 ++--
fs/afs/rotate.c | 11 ++++++++---
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/afs/fs_probe.c b/fs/afs/fs_probe.c
index 580de4adaaf6..b516d05b0fef 100644
--- a/fs/afs/fs_probe.c
+++ b/fs/afs/fs_probe.c
@@ -506,10 +506,10 @@ int afs_wait_for_one_fs_probe(struct afs_server *server, struct afs_endpoint_sta
finish_wait(&server->probe_wq, &wait);
dont_wait:
- if (estate->responsive_set & ~exclude)
- return 1;
if (test_bit(AFS_ESTATE_SUPERSEDED, &estate->flags))
return 0;
+ if (estate->responsive_set & ~exclude)
+ return 1;
if (is_intr && signal_pending(current))
return -ERESTARTSYS;
if (timo == 0)
diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index ed09d4d4c211..d612983d6f38 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -632,8 +632,10 @@ bool afs_select_fileserver(struct afs_operation *op)
wait_for_more_probe_results:
error = afs_wait_for_one_fs_probe(op->server, op->estate, op->addr_tried,
!(op->flags & AFS_OPERATION_UNINTR));
- if (!error)
+ if (error == 1)
goto iterate_address;
+ if (!error)
+ goto restart_from_beginning;
/* We've now had a failure to respond on all of a server's addresses -
* immediately probe them again and consider retrying the server.
@@ -644,10 +646,13 @@ bool afs_select_fileserver(struct afs_operation *op)
error = afs_wait_for_one_fs_probe(op->server, op->estate, op->addr_tried,
!(op->flags & AFS_OPERATION_UNINTR));
switch (error) {
- case 0:
+ case 1:
op->flags &= ~AFS_OPERATION_RETRY_SERVER;
- trace_afs_rotate(op, afs_rotate_trace_retry_server, 0);
+ trace_afs_rotate(op, afs_rotate_trace_retry_server, 1);
goto retry_server;
+ case 0:
+ trace_afs_rotate(op, afs_rotate_trace_retry_server, 0);
+ goto restart_from_beginning;
case -ERESTARTSYS:
afs_op_set_error(op, error);
goto failed;
Powered by blists - more mailing lists