lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210521145705.GA29013@redsun51.ssa.fujisawa.hgst.com>
Date:   Fri, 21 May 2021 23:57:05 +0900
From:   Keith Busch <kbusch@...nel.org>
To:     "Verma, Vishal L" <vishal.l.verma@...el.com>
Cc:     "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "hch@....de" <hch@....de>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "Schofield, Alison" <alison.schofield@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Weiny, Ira" <ira.weiny@...el.com>,
        "Widawsky, Ben" <ben.widawsky@...el.com>
Subject: Re: [BISECTED] nvme probe failure with v5.13-rc1

On Fri, May 21, 2021 at 05:00:29AM +0000, Verma, Vishal L wrote:
> Hi,
> 
> I ran into this failure to probe an nvme device in an emulator
> (simics). It looks like there is a ~60 second wait followed by a
> timeout and a failure to boot (the root device is an nvme disk) with
> these messages in the log:
> 
>    [   67.174010] nvme nvme0: I/O 5 QID 0 timeout, disable controller
>    [   67.175793] nvme nvme0: Removing after probe failure status: -4
> 
> I bisected this to:
>    5befc7c26e5a ("nvme: implement non-mdts command limits") 
> 
> It's not immediately obvious to me what's causing the problem.
> Reverting the above commit fixes it. It is easily reproducible - I'd be
> happy to provide more info about the emulated device or test out
> patches or theories.
> 
> It is of course possible that the emulated device is behaving in some
> non spec-compliant way, in which case I'd appreciate any help figuring
> out what that is.

Hi Vishal,

The patch you bisected to sends only a single Identify command, so it
sounds like that must be the command that times out. The controller is
not required to support this specific identify (CNS 0x6), but it is
required to produce a response. If the identify is unsupported, the
controller should respond with an appropriate error (Invalid Field In
Command), but it looks like the controller didn't respond at all.

So based on your observation, it sounds like the simics implementation
has an identify bug. The spec doesn't provide a way for the driver to
know ahead of time whether or not this identification is supported, so
the driver just has to try it and react to the status code. If the
implmenetation can't be fixed, then we'll need to quirk your device.

If you want to confirm for certain that the new identify is the source
of your timeout, you could try the following patch and the timeout
should go away:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1a73eed61eee..b16d31d82606 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2711,7 +2711,7 @@ static int nvme_init_non_mdts_limits(struct nvme_ctrl *ctrl)
 	else
 		ctrl->max_zeroes_sectors = 0;

-	if (nvme_ctrl_limited_cns(ctrl))
+	if (true || nvme_ctrl_limited_cns(ctrl))
 		return 0;

 	id = kzalloc(sizeof(*id), GFP_KERNEL);
--

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ