linux-kernel - Re: [PATCH] nvme-tcp: wait socket wmem to drain in queue stop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2d9257c7-de3e-42ea-a947-25e394146f57@grimberg.me>
Date: Mon, 14 Apr 2025 01:25:05 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Michael Liang <mliang@...estorage.com>, Keith Busch <kbusch@...nel.org>,
 Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>
Cc: Mohamed Khalfella <mkhalfella@...estorage.com>,
 Randy Jennings <randyj@...estorage.com>, linux-nvme@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme-tcp: wait socket wmem to drain in queue stop



On 05/04/2025 8:48, Michael Liang wrote:
> This patch addresses a data corruption issue observed in nvme-tcp during
> testing.
>
> Issue description:
> In an NVMe native multipath setup, when an I/O timeout occurs, all inflight
> I/Os are canceled almost immediately after the kernel socket is shut down.
> These canceled I/Os are reported as host path errors, triggering a failover
> that succeeds on a different path.
>
> However, at this point, the original I/O may still be outstanding in the
> host's network transmission path (e.g., the NIC’s TX queue). From the
> user-space app's perspective, the buffer associated with the I/O is considered
> completed since they're acked on the different path and may be reused for new
> I/O requests.
>
> Because nvme-tcp enables zero-copy by default in the transmission path,
> this can lead to corrupted data being sent to the original target, ultimately
> causing data corruption.

This is unexpected.

1. before retrying the command, the host shuts down the socket.
2. the host sets sk_lingerime to 0, which means that
as soon as the socket is shutdown - the packet should not be able to 
transmit again
on the socket, zero-copy or not. Perhaps there is something not handled 
correctly
with linger=0? perhaps you should try with linger=<some-timeout> ?