lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAuu1RvgwyfXI3AL@kbusch-mbp.dhcp.thefacebook.com>
Date: Fri, 25 Apr 2025 09:48:37 -0600
From: Keith Busch <kbusch@...nel.org>
To: Linjun Bao <meljbao@...il.com>
Cc: Jens Axboe <axboe@...com>, Christoph Hellwig <hch@....de>,
	Sagi Grimberg <sagi@...mberg.me>, linux-kernel@...r.kernel.org,
	linux-nvme@...ts.infradead.org
Subject: Re: [PATCH] nvme: avoid missing db ring during reset

On Fri, Apr 25, 2025 at 08:01:45PM +0800, Linjun Bao wrote:
> During nvme reset, there is a rare case, when user admin cmd such
> as smart-log and nvme_admin_create_sq from nvme_setup_io_queues
> happen to in the same blk_mq dispatch list, and the user cmd is
> the last one. nvme_admin_create_sq is dispatched first in
> nvme_queue_rq(), nvme_write_sq_db() is called but immediately
> returns without writing the doorbell because it's not masked
> "last". The subsequent smart-log ioctl fails fast hitting
> nvme_fail_nonready_cmd(), skipping both nvme_sq_copy_cmd() and
> nvme_write_sq_db(), so no doorbell write ever occurs. The
> nvme_admin_create_sq fails timeout finally.

The block layer is supposed to call the driver's commit_rqs() function
if anything in the dispatch list wasn't successful, which should notify
the controller of any pending SQEs. Is that not happening here?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ