[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d9257c7-de3e-42ea-a947-25e394146f57@grimberg.me>
Date: Mon, 14 Apr 2025 01:25:05 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Michael Liang <mliang@...estorage.com>, Keith Busch <kbusch@...nel.org>,
Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>
Cc: Mohamed Khalfella <mkhalfella@...estorage.com>,
Randy Jennings <randyj@...estorage.com>, linux-nvme@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme-tcp: wait socket wmem to drain in queue stop
On 05/04/2025 8:48, Michael Liang wrote:
> This patch addresses a data corruption issue observed in nvme-tcp during
> testing.
>
> Issue description:
> In an NVMe native multipath setup, when an I/O timeout occurs, all inflight
> I/Os are canceled almost immediately after the kernel socket is shut down.
> These canceled I/Os are reported as host path errors, triggering a failover
> that succeeds on a different path.
>
> However, at this point, the original I/O may still be outstanding in the
> host's network transmission path (e.g., the NIC’s TX queue). From the
> user-space app's perspective, the buffer associated with the I/O is considered
> completed since they're acked on the different path and may be reused for new
> I/O requests.
>
> Because nvme-tcp enables zero-copy by default in the transmission path,
> this can lead to corrupted data being sent to the original target, ultimately
> causing data corruption.
This is unexpected.
1. before retrying the command, the host shuts down the socket.
2. the host sets sk_lingerime to 0, which means that
as soon as the socket is shutdown - the packet should not be able to
transmit again
on the socket, zero-copy or not. Perhaps there is something not handled
correctly
with linger=0? perhaps you should try with linger=<some-timeout> ?
Powered by blists - more mailing lists