lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z8m9AbD3tjNpBt6p@kbusch-mbp>
Date: Thu, 6 Mar 2025 08:19:29 -0700
From: Keith Busch <kbusch@...nel.org>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
Cc: Christoph Hellwig <hch@....de>, axboe@...nel.dk,
	linux-nvme@...ts.infradead.org,
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
	Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: Re: 6.13/regression/bisected - new nvme timeout errors

On Wed, Jan 15, 2025 at 02:58:04AM +0500, Mikhail Gavrilov wrote:
> Hi,
> During 6.13 development cycle I spotted strange new nvme errors in the
> log which I never seen before.
> 
> [87774.010474] nvme nvme1: I/O tag 0 (3000) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:131072

...

> I still haven't found a stable way to reproduce this.
> But I'm pretty sure that if this error don't appearing within two
> days, then we can assume that the kernel isn't affected by the
> problem.
> So I made bisection with above assumption and found this commit:
> 
> beadf0088501d9dcf2454b05d90d5d31ea3ba55f is the first bad commit
> commit beadf0088501d9dcf2454b05d90d5d31ea3ba55f
> Author: Christoph Hellwig <hch@....de>
> Date:   Wed Nov 13 16:20:41 2024 +0100
> 
>     nvme-pci: reverse request order in nvme_queue_rqs

The patch here uses the order recieved to dispatch commands in
consequetive submission queue entries, which is supposed to be the
desired behavior for any device. I did some testing on mailine, and it
sure looks like the order the driver does this is optimal, so I'm not
sure what's going on with your observation.

Do you have a scheduler enabled on your device?

How are you generating IO? Is it a pattern I should be able to replicate
with 'fio'?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ