linux-kernel - Re: [PATCH] nvme-pci: fix potential I/O hang when CQ is full

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aYx1xoY_yeNvTtF2@kbusch-mbp>
Date: Wed, 11 Feb 2026 05:27:50 -0700
From: Keith Busch <kbusch@...nel.org>
To: Junnan Zhang <zhangjn_dev@....com>
Cc: axboe@...nel.dk, hch@....de, linux-kernel@...r.kernel.org,
	linux-nvme@...ts.infradead.org, liuyx92@...natelecom.cn,
	sagi@...mberg.me, sunshx@...natelecom.cn, yuanql9@...natelecom.cn,
	zhangjn11@...natelecom.cn, zhangzl68@...natelecom.cn
Subject: Re: [PATCH] nvme-pci: fix potential I/O hang when CQ is full

On Wed, Feb 11, 2026 at 05:47:44PM +0800, Junnan Zhang wrote:
> On Tue, 10 Feb 2026 16:57:12 +0100, Christoph Hellwig wrote:
> 
> > We can't update the CQ head before consuming the CQEs, otherwise
> > the device can reuse them.  And devices must not discard completions
> > when there is no completion queue entry, nvme does allow SQs and CQs
> > to be smaller than the number of outstanding commands.
> 
> Updating the CQ head before consuming the CQE would not cause the device to 
> reuse these entries, as new commands can only be submitted by the driver after
> the CQE is consumed. Therefore, the device does not have the opportunity 
> to reuse these entries.

That's just an artifact of how this host implementation constrains its
tag space. It's not a reflection of how the NVMe protocol fundamentally
works.

A full queue is not an error. It's a spec defined condition that the
submitter just has to deal with. The protocol was specifically made to
allow scenarios for dispatching more outstanding commands than the
queues can hold. Your controller is broken.