[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47ef76c6-4430-ca24-405e-a226d4303a39@grimberg.me>
Date: Fri, 6 Aug 2021 12:42:00 -0700
From: Sagi Grimberg <sagi@...mberg.me>
To: Daniel Wagner <dwagner@...e.de>, linux-nvme@...ts.infradead.org
Cc: linux-kernel@...r.kernel.org, Hannes Reinecke <hare@...e.de>,
yi.he@....com
Subject: Re: [PATCH] nvme-tcp: Do not reset transport on data digest errors
> The spec says
>
> 7.4.6.1 Digest Error handling
>
> When a host detects a data digest error in a C2HData PDU, that host
> shall continue processing C2HData PDUs associated with the command and
> when the command processing has completed, if a successful status was
> returned by the controller, the host shall fail the command with a
> non-fatal transport error.
>
> Currently the transport is reseted when a data digest error is
> detected. To fix this, keep track of the final status in the queue
> object and use it when completing the request.
>
> The new member can be placed adjacent to the receive related members and
> fits in the cacheline as there is a 4 byte hole.
>
> Signed-off-by: Daniel Wagner <dwagner@...e.de>
> ---
>
> Hi,
>
> I've tested this by modifying the receive path. Via the fault_inject
> interface I injecting wrong hash values. The request would then be
> completed with status != 0 and nvme_decide_disposition decices to
> retry the request. So this seems be in more sync with what the spec
> says on this topic.
>
> Daniel
>
> drivers/nvme/host/tcp.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 097f7dd10ed3..5253147df4c7 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -89,6 +89,7 @@ struct nvme_tcp_queue {
> size_t data_remaining;
> size_t ddgst_remaining;
> unsigned int nr_cqe;
> + u16 status;
Why is this a queue member and not a request member?
>
> /* send state */
> struct nvme_tcp_request *request;
> @@ -496,7 +497,8 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
> return -EINVAL;
> }
>
> - if (!nvme_try_complete_req(rq, cqe->status, cqe->result))
> + if (!nvme_try_complete_req(rq, queue->status ?
> + queue->status : cqe->status, cqe->result))
> nvme_complete_rq(rq);
> queue->nr_cqe++;
>
> @@ -676,6 +678,7 @@ static int nvme_tcp_recv_pdu(struct nvme_tcp_queue *queue, struct sk_buff *skb,
>
> switch (hdr->type) {
> case nvme_tcp_c2h_data:
> + queue->status = NVME_SC_SUCCESS;
> return nvme_tcp_handle_c2h_data(queue, (void *)queue->pdu);
> case nvme_tcp_rsp:
> nvme_tcp_init_recv_ctx(queue);
> @@ -758,7 +761,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
> queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH;
> } else {
> if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
> - nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
> + nvme_tcp_end_request(rq, queue->status);
> queue->nr_cqe++;
> }
> nvme_tcp_init_recv_ctx(queue);
> @@ -792,14 +795,14 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue,
> "data digest error: recv %#x expected %#x\n",
> le32_to_cpu(queue->recv_ddgst),
> le32_to_cpu(queue->exp_ddgst));
> - return -EIO;
> + queue->status = NVME_SC_DATA_XFER_ERROR;
> }
>
> if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
> struct request *rq = nvme_cid_to_rq(nvme_tcp_tagset(queue),
> pdu->command_id);
>
> - nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
> + nvme_tcp_end_request(rq, queue->status);
> queue->nr_cqe++;
> }
>
>
Powered by blists - more mailing lists