linux-kernel - Re: 6.2 nvme-pci: something wrong

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y6d37vGSCKvfJhzD@kbusch-mbp.dhcp.thefacebook.com>
Date:   Sat, 24 Dec 2022 15:06:38 -0700
From:   Keith Busch <kbusch@...nel.org>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Jens Axboe <axboe@...nel.dk>, Sagi Grimberg <sagi@...mberg.me>,
        Chaitanya Kulkarni <kch@...dia.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: 6.2 nvme-pci: something wrong

On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote:
> Hi Christoph,
> 
> There's something wrong with the nvme-pci heading for 6.2-rc1:
> no problem booting here on this Lenovo ThinkPad X1 Carbon 5th,
> but under load...
> 
> nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: I/O 0 QID 2 timeout, reset controller
> 
> ...and more, until I just have to poweroff and reboot.
> 
> Bisection points to your
> 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> And that does revert cleanly, giving a kernel which shows no problem.
> 
> I've spent a while comparing old nvme_pci_alloc_tag_set() and new
> nvme_alloc_io_tag_set(), I do not know my way around there at all
> and may be talking nonsense, but it did look as if there might now
> be a difference in the queue_depth, sqsize, q_depth conversions.
> 
> I'm running load successfully with the patch below, but I strongly
> suspect that the right patch will be somewhere else: over to you!
> 
> Hugh
> 
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ct
>  
>  	memset(set, 0, sizeof(*set));
>  	set->ops = ops;
> -	set->queue_depth = ctrl->sqsize + 1;
> +	set->queue_depth = ctrl->sqsize;

Your observation is a queue-wrap condition that makes it impossible for
the controller know there are new commands.

Your patch does look like the correct thing to do. The "zero means one"
thing is a confusing distraction, I think. It makes more sense if you
consider sqsize as the maximum number of tags we can have outstanding at
one time and it looks like all the drivers set it that way. We're
supposed to leave one slot empty for a full NVMe queue, so adding one
here to report the total number slots isn't right since that would allow
us to fill all slots.

Fabrics drivers have been using this method for a while, though, so
interesting they haven't had a simiar problem.