[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+hkOd7_+n2qiXdOw1Z9nYTHP1tosqNOwDBXZwML0kND2WmG8w@mail.gmail.com>
Date: Sat, 27 Dec 2025 22:23:15 -0800
From: Alex Tran <alex.t.tran@...il.com>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] nvme/host: add delayed retries upon non-fatal error
during ns validation
On Sat, Dec 27, 2025 at 2:42 AM Sagi Grimberg <sagi@...mberg.me> wrote:
>
>
>
> On 26/12/2025 19:54, Alex Tran wrote:
> > If a non-fatal error is received during nvme namespace validation, it
> > should not be ignored and the namespace should not be removed immediately.
> > Rather, delayed retries should be performed on the namespace validation
> > process.
> >
> > This handles non-fatal issues more robustly, by retrying a few times before
> > giving up and removing the namespace. The number of retries is set
> > to 3 and the interval between retries is set to 3 seconds.
> >
> > The retries are handled locally. Upon success then end. Upon fatal error
> > then remove the namespace before ending. Upon non-fatal error, retry
> > until the max retry amount is hit before ending.
> >
> > Signed-off-by: Alex Tran <alex.t.tran@...il.com>
> > ---
> > Changes in v2:
> > - Simplify retry logic with local loop instead of delayed work.
> > ---
> > drivers/nvme/host/core.c | 37 +++++++++++++++++++++++++++++--------
> > drivers/nvme/host/nvme.h | 6 ++++++
> > 2 files changed, 35 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 7bf228df6001f1f4d0b3c570de285a5eb17bb08e..6cd1ce4a60af55e5673e5fd0ec2053a028fae4f2 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -4289,23 +4289,44 @@ static void nvme_ns_remove_by_nsid(struct nvme_ctrl *ctrl, u32 nsid)
> > static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
> > {
> > int ret = NVME_SC_INVALID_NS | NVME_STATUS_DNR;
> > + unsigned int i;
> >
> > if (!nvme_ns_ids_equal(&ns->head->ids, &info->ids)) {
> > - dev_err(ns->ctrl->device,
> > - "identifiers changed for nsid %d\n", ns->head->ns_id);
> > + dev_err(ns->ctrl->device, "identifiers changed for nsid %d\n",
> > + ns->head->ns_id);
> > goto out;
> > }
> >
> > - ret = nvme_update_ns_info(ns, info);
> > + for (i = 0; i <= NVME_NS_VALIDATION_MAX_RETRIES; i++) {
> > + ret = nvme_update_ns_info(ns, info);
> > + if (ret == 0)
> > + return;
> > +
> > + if (ret > 0 && (ret & NVME_STATUS_DNR))
> > + goto out;
> > +
> > + if (i == NVME_NS_VALIDATION_MAX_RETRIES) {
> > + dev_err(ns->ctrl->device,
> > + "validation failed for nsid %d after %d retries\n",
> > + ns->head->ns_id,
> > + NVME_NS_VALIDATION_MAX_RETRIES);
> > + return;
> > + }
>
> This is really redundant. just make the loop (i <
> NVME_NS_VALIDATION_MAX_RETRIES)
> and print in the end outside the loop.
> BTW, don't you need to remove the ns here?
>
In the original, a non-fatal error was just ignored without removing
the ns, so I assumed the same behavior, even when still receiving a
non-fatal error after the max retries. Unless it is preferred to
remove the namespace after failing max retry times?
Powered by blists - more mailing lists