[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAuu1RvgwyfXI3AL@kbusch-mbp.dhcp.thefacebook.com>
Date: Fri, 25 Apr 2025 09:48:37 -0600
From: Keith Busch <kbusch@...nel.org>
To: Linjun Bao <meljbao@...il.com>
Cc: Jens Axboe <axboe@...com>, Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>, linux-kernel@...r.kernel.org,
linux-nvme@...ts.infradead.org
Subject: Re: [PATCH] nvme: avoid missing db ring during reset
On Fri, Apr 25, 2025 at 08:01:45PM +0800, Linjun Bao wrote:
> During nvme reset, there is a rare case, when user admin cmd such
> as smart-log and nvme_admin_create_sq from nvme_setup_io_queues
> happen to in the same blk_mq dispatch list, and the user cmd is
> the last one. nvme_admin_create_sq is dispatched first in
> nvme_queue_rq(), nvme_write_sq_db() is called but immediately
> returns without writing the doorbell because it's not masked
> "last". The subsequent smart-log ioctl fails fast hitting
> nvme_fail_nonready_cmd(), skipping both nvme_sq_copy_cmd() and
> nvme_write_sq_db(), so no doorbell write ever occurs. The
> nvme_admin_create_sq fails timeout finally.
The block layer is supposed to call the driver's commit_rqs() function
if anything in the dispatch list wasn't successful, which should notify
the controller of any pending SQEs. Is that not happening here?
Powered by blists - more mailing lists