[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210521145705.GA29013@redsun51.ssa.fujisawa.hgst.com>
Date: Fri, 21 May 2021 23:57:05 +0900
From: Keith Busch <kbusch@...nel.org>
To: "Verma, Vishal L" <vishal.l.verma@...el.com>
Cc: "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
"hch@....de" <hch@....de>,
"Williams, Dan J" <dan.j.williams@...el.com>,
"Schofield, Alison" <alison.schofield@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Weiny, Ira" <ira.weiny@...el.com>,
"Widawsky, Ben" <ben.widawsky@...el.com>
Subject: Re: [BISECTED] nvme probe failure with v5.13-rc1
On Fri, May 21, 2021 at 05:00:29AM +0000, Verma, Vishal L wrote:
> Hi,
>
> I ran into this failure to probe an nvme device in an emulator
> (simics). It looks like there is a ~60 second wait followed by a
> timeout and a failure to boot (the root device is an nvme disk) with
> these messages in the log:
>
> [ 67.174010] nvme nvme0: I/O 5 QID 0 timeout, disable controller
> [ 67.175793] nvme nvme0: Removing after probe failure status: -4
>
> I bisected this to:
> 5befc7c26e5a ("nvme: implement non-mdts command limits")
>
> It's not immediately obvious to me what's causing the problem.
> Reverting the above commit fixes it. It is easily reproducible - I'd be
> happy to provide more info about the emulated device or test out
> patches or theories.
>
> It is of course possible that the emulated device is behaving in some
> non spec-compliant way, in which case I'd appreciate any help figuring
> out what that is.
Hi Vishal,
The patch you bisected to sends only a single Identify command, so it
sounds like that must be the command that times out. The controller is
not required to support this specific identify (CNS 0x6), but it is
required to produce a response. If the identify is unsupported, the
controller should respond with an appropriate error (Invalid Field In
Command), but it looks like the controller didn't respond at all.
So based on your observation, it sounds like the simics implementation
has an identify bug. The spec doesn't provide a way for the driver to
know ahead of time whether or not this identification is supported, so
the driver just has to try it and react to the status code. If the
implmenetation can't be fixed, then we'll need to quirk your device.
If you want to confirm for certain that the new identify is the source
of your timeout, you could try the following patch and the timeout
should go away:
---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1a73eed61eee..b16d31d82606 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2711,7 +2711,7 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
else
ctrl->max_zeroes_sectors = 0;
- if (nvme_ctrl_limited_cns(ctrl))
+ if (true || nvme_ctrl_limited_cns(ctrl))
return 0;
id = kzalloc(sizeof(*id), GFP_KERNEL);
--
Powered by blists - more mailing lists