linux-kernel - Re: [PATCH v3 1/1] nvme-tcp: wait socket wmem to drain in queue stop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <a306a2b0-bd7c-4376-8c26-738b5c7c9807@grimberg.me>
Date: Sat, 26 Apr 2025 00:53:30 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Michael Liang <mliang@...estorage.com>, Keith Busch <kbusch@...nel.org>,
 Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>
Cc: Mohamed Khalfella <mkhalfella@...estorage.com>,
 Randy Jennings <randyj@...estorage.com>, linux-nvme@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] nvme-tcp: wait socket wmem to drain in queue stop

Given that this is a fix - lets rename the patch title to say that.

nvme-tcp: fix possible data corruption caused by premature queue removal 
and I/O failover


On 24/04/2025 19:17, Michael Liang wrote:
> This patch addresses a data corruption issue observed in nvme-tcp during
> testing.
>
> Issue description:
> In an NVMe native multipath setup, when an I/O timeout occurs, all inflight
> I/Os are canceled almost immediately after the kernel socket is shut down.
> These canceled I/Os are reported as host path errors, triggering a failover
> that succeeds on a different path.
>
> However, at this point, the original I/O may still be outstanding in the
> host's network transmission path (e.g., the NIC’s TX queue). From the
> user-space app's perspective, the buffer associated with the I/O is considered
> completed since they're acked on the different path and may be reused for new
> I/O requests.
>
> Because nvme-tcp enables zero-copy by default in the transmission path,
> this can lead to corrupted data being sent to the original target, ultimately
> causing data corruption.
>
> We can reproduce this data corruption by injecting delay on one path and
> triggering i/o timeout.
>
> To prevent this issue, this change ensures that all inflight transmissions are
> fully completed from host's perspective before returning from queue
> stop. To handle concurrent I/O timeout from multiple namespaces under
> the same controller, always wait in queue stop regardless of queue's state.
>
> This aligns with the behavior of queue stopping in other NVMe fabric transports.

We need a "Fixes: " tag, even if it goes all the way to day-0...

>
> Reviewed-by: Mohamed Khalfella <mkhalfella@...estorage.com>
> Reviewed-by: Randy Jennings <randyj@...estorage.com>
> Signed-off-by: Michael Liang <mliang@...estorage.com>

Please resend, but you can add to your v4
Reviewed-by: Sagi Grimberg <sagi@...mberg.me>