[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51fc1a3f-7166-49cf-9652-a95bdc5d2119@grimberg.me>
Date: Mon, 15 Apr 2024 23:51:19 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Daniel Wagner <dwagner@...e.de>, Christoph Hellwig <hch@....de>
Cc: Keith Busch <kbusch@...nel.org>, James Smart <james.smart@...adcom.com>,
Hannes Reinecke <hare@...e.de>, linux-nvme@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 5/5] nvme-fabrics: handle transient auth failures
On 15/04/2024 15:42, Daniel Wagner wrote:
> Retry authentication a couple of times to avoid error out on transient
> errors. E.g. if a new key is deployed on the host and the target. At the
> same time the connection drops. The host might use the wrong key to
> reconnect.
>
> Suggested-by: Sagi Grimberg <sagi@...mberg.me>
> Signed-off-by: Daniel Wagner <dwagner@...e.de>
> ---
> drivers/nvme/host/auth.c | 4 ++--
> drivers/nvme/host/fabrics.c | 10 +++++++++-
> drivers/nvme/host/fc.c | 2 ++
> drivers/nvme/host/nvme.h | 1 +
> drivers/nvme/host/rdma.c | 1 +
> drivers/nvme/host/tcp.c | 1 +
> 6 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
> index a264b3ae078b..368dcc6ee11b 100644
> --- a/drivers/nvme/host/auth.c
> +++ b/drivers/nvme/host/auth.c
> @@ -797,7 +797,7 @@ static void nvme_queue_auth_work(struct work_struct *work)
> NVME_AUTH_DHCHAP_MESSAGE_SUCCESS1);
> if (ret) {
> chap->status = ret;
> - chap->error = -ECONNREFUSED;
> + chap->error = -EKEYREJECTED;
> return;
> }
>
> @@ -818,7 +818,7 @@ static void nvme_queue_auth_work(struct work_struct *work)
> ret = nvme_auth_process_dhchap_success1(ctrl, chap);
> if (ret) {
> /* Controller authentication failed */
> - chap->error = -ECONNREFUSED;
> + chap->error = -EKEYREJECTED;
> goto fail2;
> }
>
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index d9a73b1b41c4..6dfa45dce232 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -573,8 +573,16 @@ EXPORT_SYMBOL_GPL(nvmf_connect_io_queue);
> */
> bool nvmf_should_reconnect(struct nvme_ctrl *ctrl, int status)
> {
> - if (status < 0)
> + if (status < 0) {
> + /*
> + * authentication errors can be transient, thus retry a couple
> + * of times before giving up.
> + */
> + if (status == -EKEYREJECTED &&
> + ++ctrl->nr_auth_retries < 3)
> + return true;
I did not suggest nr_auth_retries. Where is the 3 coming from? The
controller already
has a number of reconnects before it gives up, no reason to add another one.
Just don't return false based on the status if it is a transient
authentication error.
The patch just needs to be modified from:
if (status < 0)
to
if (status < 0 && status != -EKEYREJECTED Plus a comment that
explains it.
Powered by blists - more mailing lists