linux-kernel - Re: 6.2 nvme-pci: something wrong

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y6amzxU7choHAXWi@infradead.org>
Date:   Fri, 23 Dec 2022 23:14:23 -0800
From:   Christoph Hellwig <hch@...radead.org>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Jens Axboe <axboe@...nel.dk>, Keith Busch <kbusch@...nel.org>,
        Sagi Grimberg <sagi@...mberg.me>,
        Chaitanya Kulkarni <kch@...dia.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: 6.2 nvme-pci: something wrong

On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote:
> Hi Christoph,
> 
> There's something wrong with the nvme-pci heading for 6.2-rc1:
> no problem booting here on this Lenovo ThinkPad X1 Carbon 5th,
> but under load...
> 
> nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: I/O 0 QID 2 timeout, reset controller
> 
> ...and more, until I just have to poweroff and reboot.
> 
> Bisection points to your
> 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> And that does revert cleanly, giving a kernel which shows no problem.
> 
> I've spent a while comparing old nvme_pci_alloc_tag_set() and new
> nvme_alloc_io_tag_set(), I do not know my way around there at all
> and may be talking nonsense, but it did look as if there might now
> be a difference in the queue_depth, sqsize, q_depth conversions.
> 
> I'm running load successfully with the patch below, but I strongly
> suspect that the right patch will be somewhere else: over to you!

Thanks for the report!  The patch is definitively wrong, ->sqsize
hold one of the awful so called 'zeroes based values' in NVMe,
where 0 means 1, and thus have a built-in one off.  We should
probably convert it to a sane value at read time, but that's a
separate discussion.

I suspect your controller is one of those where we quirk the size,
and the commit you bisected fails to reflects that in the common
sqsizse value.  The patch below should be the minimum fix, and in
the long term, the duplicate bookkeeping for it in the PCI driver
should go away:

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f0f8027644bbf8..a73c0ee7bd1892 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2536,7 +2536,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
 
 	dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
 				io_queue_depth);
-	dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
 	dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
 	dev->dbs = dev->bar + 4096;
 
@@ -2577,7 +2576,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
 		dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
 			 dev->q_depth);
 	}
-
+	dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
 
 	nvme_map_cmb(dev);