[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zz3+rNnvxE2TRT0v@tissot.1015granger.net>
Date: Wed, 20 Nov 2024 10:22:20 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: Benjamin Coddington <bcodding@...hat.com>
Cc: Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 2/2] nfs/blocklayout: Limit repeat device registration on
failure
On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote:
> If we're unable to register a SCSI device, ensure we mark the device as
> unavailable so that it will timeout and be re-added via GETDEVINFO. This
> avoids repeated doomed attempts to register a device in the IO path.
>
> Add some clarifying comments as well.
>
> Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
> Signed-off-by: Benjamin Coddington <bcodding@...hat.com>
> Reviewed-by: Christoph Hellwig <hch@....de>
> ---
> fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 0becdec12970..b36bc2f4f7e2 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server,
> if (!node)
> return ERR_PTR(-ENODEV);
>
> + /*
> + * Devices that are marked unavailable are left in the cache with a
> + * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or
> + * constantly attempting to register the device. Once marked as
> + * unavailable they must be deleted and never reused.
> + */
> if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) {
> unsigned long end = jiffies;
> unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT;
>
> if (!time_in_range(node->timestamp_unavailable, start, end)) {
> + /* Force a new GETDEVINFO for this LAYOUT */
Or perhaps: "Uncork subsequent GETDEVINFO operations for this device"
<shrug>
> nfs4_delete_deviceid(node->ld, node->nfs_client, id);
> goto retry;
> }
> goto out_put;
> }
>
> - if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node)))
> + /* If we cannot register, treat this device as transient */
How about "Make a negative cache entry for this device"
> + if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) {
> + nfs4_mark_deviceid_unavailable(node);
> goto out_put;
> + }
>
> return node;
>
> --
> 2.47.0
>
It took me a bit to understand what this patch does. It is like
setting up a negative dentry so the local device cache absorbs
bursts of checks for the device. OK.
Just an observation: Negative caching has some consequences too.
For instance, there will now be a period where, if the device
happens to become available, the layout is still unusable. I wonder
if that's going to have some undesirable operational effects.
--
Chuck Lever
Powered by blists - more mailing lists