[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABXGCsMd_xv8jPDF_sFYhwd8GtANZ23nbaSJuCxQRO7cjPtgWg@mail.gmail.com>
Date: Sun, 23 Mar 2025 14:49:40 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: Keith Busch <kbusch@...nel.org>, mlombard@...backstore.eu
Cc: Christoph Hellwig <hch@....de>, axboe@...nel.dk, linux-nvme@...ts.infradead.org,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: Re: 6.13/regression/bisected - new nvme timeout errors
On Thu, Mar 6, 2025 at 8:19 PM Keith Busch <kbusch@...nel.org> wrote:
>
> On Wed, Jan 15, 2025 at 02:58:04AM +0500, Mikhail Gavrilov wrote:
> > Hi,
> > During 6.13 development cycle I spotted strange new nvme errors in the
> > log which I never seen before.
> >
> > [87774.010474] nvme nvme1: I/O tag 0 (3000) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:131072
>
> ...
>
> > I still haven't found a stable way to reproduce this.
> > But I'm pretty sure that if this error don't appearing within two
> > days, then we can assume that the kernel isn't affected by the
> > problem.
> > So I made bisection with above assumption and found this commit:
> >
> > beadf0088501d9dcf2454b05d90d5d31ea3ba55f is the first bad commit
> > commit beadf0088501d9dcf2454b05d90d5d31ea3ba55f
> > Author: Christoph Hellwig <hch@....de>
> > Date: Wed Nov 13 16:20:41 2024 +0100
> >
> > nvme-pci: reverse request order in nvme_queue_rqs
>
> The patch here uses the order recieved to dispatch commands in
> consequetive submission queue entries, which is supposed to be the
> desired behavior for any device. I did some testing on mailine, and it
> sure looks like the order the driver does this is optimal, so I'm not
> sure what's going on with your observation.
>
> Do you have a scheduler enabled on your device?
>
> How are you generating IO? Is it a pattern I should be able to replicate
> with 'fio'?
Sorry for not answering for a long time, but all because I was testing
the off-list patch that was sent to me by Maurizio Lombardi.
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 950289405ef2..990d9ef1eef9 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -989,6 +989,9 @@ static void nvme_submit_cmds(struct nvme_queue
*nvmeq, struct rq_list *rqlist)
{
struct request *req;
+ if (rq_list_empty(rqlist))
+ return;
+
spin_lock(&nvmeq->sq_lock);
while ((req = rq_list_pop(rqlist))) {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
And I want to say right away that this patch solved my problem. I
double checked, and both times the system worked for 4 days without
errors:
nvme nvme0: I/O tag 960 (33c0) opcode 0x2 (I/O Cmd) QID 7 timeout,
aborting req_op:READ(0) size:16384
nvme nvme0: I/O tag 961 (33c1) opcode 0x2 (I/O Cmd) QID 7 timeout,
aborting req_op:READ(0) size:8192
nvme nvme0: Abort status: 0x0
* * *
nvme nvme0: I/O tag 960 (33c0) opcode 0x2 (I/O Cmd) QID 7 timeout,
reset controller
and subsequent freezing of some applications.
> Do you have a scheduler enabled on your device?
No. nvme0 is server-tire 30TB SSD.
> head /sys/block/*/queue/scheduler
==> /sys/block/nvme0n1/queue/scheduler <==
[none] mq-deadline kyber bfq
==> /sys/block/nvme1n1/queue/scheduler <==
[none] mq-deadline kyber bfq
==> /sys/block/sda/queue/scheduler <==
none mq-deadline kyber [bfq]
==> /sys/block/sdb/queue/scheduler <==
none mq-deadline kyber [bfq]
> How are you generating IO? Is it a pattern I should be able to replicate with 'fio'?
It's very complicated. Because it happens only when I update (podman
pull) a large (16 GB) Docker container.
$ podman images
REPOSITORY TAG
IMAGE ID CREATED SIZE
dev-image-store.my-company.com/local-stand/service 25.2100-latest
513139713b48 19 hours ago 16.5 GB
dev-image-store.my-company.com/local-stand/service 25.1220-latest
920708d4b3dc 9 days ago 16.8 GB
dev-image-store.my-company.com/local-stand/service 25.1220
67d2af91f323 10 days ago 16.8 GB
dev-image-store.my-company.com/local-stand/service 25.2100
24e6817bff52 3 weeks ago 16.4 GB
Podman stores its images on the 30TB BTRFS partition located in /home.
--
Best Regards,
Mike Gavrilov.
Powered by blists - more mailing lists