[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPpK+O3zU_+UTGx044iCfsUzwD7Dy+X5DC=N6Dr9BNzrrjxqEQ@mail.gmail.com>
Date: Thu, 18 Dec 2025 18:06:02 -0800
From: Randy Jennings <randyj@...estorage.com>
To: Mohamed Khalfella <mkhalfella@...estorage.com>
Cc: Chaitanya Kulkarni <kch@...dia.com>, Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
Keith Busch <kbusch@...nel.org>, Sagi Grimberg <sagi@...mberg.me>,
Aaron Dailey <adailey@...estorage.com>, John Meneghini <jmeneghi@...hat.com>,
Hannes Reinecke <hare@...e.de>, linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 10/14] nvme-tcp: Use CCR to recover controller that
hits an error
On Tue, Nov 25, 2025 at 6:13 PM Mohamed Khalfella
<mkhalfella@...estorage.com> wrote:
>
> An alive nvme controller that hits an error now will move to RECOVERING
> state instead of RESETTING state. In RECOVERING state ctrl->err_work
> will attempt to use cross-controller recovery to terminate inflight IOs
> on the controller. If CCR succeeds, then switch to RESETTING state and
> continue error recovery as usuall by tearing down controller and attempt
> reconnecting to target. If CCR fails, then the behavior of recovery
"usuall" -> "usual"
"attempt reconnecting" -> "attempting to reconnect"
it would read better with "the" added:
"tearing down the controller"
"reconnect to the target"
> depends on whether CQT is supported or not. If CQT is supported, switch
> to time-based recovery by holding inflight IOs until it is safe for them
> to be retried. If CQT is not supported proceed to retry requests
> immediately, as the code currently does.
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> +static int nvme_tcp_recover_ctrl(struct nvme_ctrl *ctrl)
> + dev_info(ctrl->device,
> + "CCR failed, switch to time-based recovery, timeout = %ums\n",
> + jiffies_to_msecs(rem));
> + set_bit(NVME_CTRL_RECOVERED, &ctrl->flags);
> + queue_delayed_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work, rem);
> + return -EAGAIN;
I see how setting this bit before the delayed work executes works
to complete recovery, but it is kindof weird that the bit is called
RECOVERED. I do not have a better name. TIME_BASED_RECOVERY?
RECOVERY_WAIT?
> static void nvme_tcp_error_recovery_work(struct work_struct *work)
> {
> - struct nvme_tcp_ctrl *tcp_ctrl = container_of(work,
> + struct nvme_tcp_ctrl *tcp_ctrl = container_of(to_delayed_work(work),
> struct nvme_tcp_ctrl, err_work);
> struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;
>
> + if (nvme_ctrl_state(ctrl) == NVME_CTRL_RECOVERING) {
> + if (nvme_tcp_recover_ctrl(ctrl))
> + return;
> + }
> +
> if (nvme_tcp_key_revoke_needed(ctrl))
> nvme_auth_revoke_tls_key(ctrl);
> nvme_stop_keep_alive(ctrl);
The state of the controller should not be LIVE while waiting for
recovery, so I do not think we will succeed in sending keep alives,
but I think this should move to before (or inside of)
nvme_tcp_recover_ctrl().
Sincerely,
Randy Jennings
Powered by blists - more mailing lists