lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <833fd772-da6c-4f91-87e3-e13883f1815d@linux.ibm.com>
Date: Sun, 11 Jan 2026 15:03:43 +0530
From: Nilay Shroff <nilay@...ux.ibm.com>
To: John Meneghini <jmeneghi@...hat.com>, Daniel Wagner <wagi@...nel.org>,
        Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
        Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
        James Smart <james.smart@...adcom.com>, Hannes Reinecke <hare@...e.de>,
        Shinichiro Kawasaki <shinichiro.kawasaki@....com>,
        Wen Xiong <wenxiong@...ux.ibm.com>,
        Narayana Murty N <nnmlinux@...ux.ibm.com>
Cc: linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Ewan Milne <emilne@...hat.com>,
        Maurizio Lombardi <mlombard@...hat.com>
Subject: Re: [PATCH 1/2] nvme: only allow entering LIVE from CONNECTING state



On 1/10/26 12:48 AM, John Meneghini wrote:
> Unfortunately, it has been discovered that this patch causes a serious regression on powerpc platforms.
> 
> If anyone has a powerpc platform with an NVMe/PCIe device installed, please run this simple test and see if it works.
> 
> # uname -av
> Linux rdma-cert-03-lp10.rdma.lab.eng.rdu2.redhat.com 6.19.0-rc4+ #1 SMP Wed Jan  7 21:42:54 EST 2026 ppc64le GNU/Linux
> 
> # nvme list-subsys /dev/nvme0n1
> nvme-subsys0 - NQN=nqn.1994-11.com.samsung:nvme:PM1735:HHHL:S4WANA0R400032
>                hostnqn=nqn.2014-08.org.nvmexpress:uuid:1654a627-93b6-4650-ba90-f4dc7a2fd3ee
>                iopolicy=numa
> \
>  +- nvme0 pcie 0018:01:00.0 live optimized
> 
> # nvme subsystem-reset /dev/nvme0; nvme list-subsys /dev/nvme0n1; sleep 1; nvme list-subsys /dev/nvme0n1; nvme list-subsys /dev/nvme0n1;
> nvme-subsys0 - NQN=nqn.1994-11.com.samsung:nvme:PM1735:HHHL:S4WANA0R400032
>                hostnqn=nqn.2014-08.org.nvmexpress:uuid:1654a627-93b6-4650-ba90-f4dc7a2fd3ee
>                iopolicy=numa
> \
>  +- nvme0 pcie 0018:01:00.0 resetting optimized
> [Wed Jan  7 21:59:51 2026] block nvme0n1: no usable path - requeuing I/O
> [Wed Jan  7 21:59:51 2026] block nvme0n1: no usable path - requeuing I/O
> [Wed Jan  7 21:59:51 2026] block nvme0n1: no usable path - requeuing I/O
> [Wed Jan  7 21:59:51 2026] block nvme0n1: no usable path - requeuing I/O
> [Wed Jan  7 21:59:51 2026] block nvme0n1: no usable path - requeuing I/O
> 
> # nvme list-subsys /dev/nvme0n1;
> 
> # nvme list-subsys /dev/nvme0n1;
> nvme-subsys0 - NQN=nqn.1994-11.com.samsung:nvme:PM1735:HHHL:S4WANA0R400032
>                hostnqn=nqn.2014-08.org.nvmexpress:uuid:1654a627-93b6-4650-ba90-f4dc7a2fd3ee
>                iopolicy=numa
> \
>  +- nvme0 pcie 0018:01:00.0 resetting optimized
> 
> At this point the machine is HUNG. It's stuck in the resetting state forever.
> 
> Because /dev/nvme0n1 is the root device, I need to power-cycle/reboot the host to recover.
> /John
> 
> On 2/14/25 3:02 AM, Daniel Wagner wrote:
>> The fabric transports and also the PCI transport are not entering the
>> LIVE state from NEW or RESETTING. This makes the state machine more
>> restrictive and allows to catch not supported state transitions, e.g.
>> directly switching from RESETTING to LIVE.
>>
>> Signed-off-by: Daniel Wagner <wagi@...nel.org>
>> ---
>>   drivers/nvme/host/core.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 818d4e49aab51c388af9a48bf9d466fea9cef51b..f028913e2e622ee348e88879c6e6b7e8f8a1cc82 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -564,8 +564,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>>       switch (new_state) {
>>       case NVME_CTRL_LIVE:
>>           switch (old_state) {
>> -        case NVME_CTRL_NEW:
>> -        case NVME_CTRL_RESETTING:
>>           case NVME_CTRL_CONNECTING:
>>               changed = true;
>>               fallthrough;
>>
> 
This was broken because with this commit d2fe192348f9 (“nvme: only allow entering LIVE
from CONNECTING state”) now we don't allow changing controller state from 
RESETTING -> LIVE. I saw we also had similar state change issue with firmware activation 
code which was fixed by explicitly transitioning the controller state through RESETTING ->
CONNECTING -> LIVE. We may employ the similar solution here for subsystem reset case as well. 

Currently, the NVMe PCIe subsystem reset code performs the following steps:

1. Sets the controller state to RESETTING
2. Writes the subsystem reset command to the NSSR register
3. Attempts to transition the controller state directly to LIVE

This effectively bypasses the CONNECTING state. The transition to LIVE is artificial but
intentional, since writing to the NSSR register causes the loss of communication with the
NVMe adapter and the controller must be marked LIVE so that any in-flight I/O at the time the
subsystem reset is issued, or an explicit MMIO read, can trigger EEH recovery and ultimately
restore communication link between the NVMe adapter and the system.

With the stricter state transition rules introduced by commit d2fe192348f9 (“nvme: only allow
entering LIVE from CONNECTING state”), the direct transition from RESETTING -> LIVE is no longer
permitted, rendering the current logic ineffective.

Taking a cue from the firmware activation fix, it seems reasonable to explicitly transition
the controller state through CONNECTING in the subsystem reset path as well. So how about making
the following change to fix this? 

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0e4caeab739c..3027bba232de 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1532,7 +1532,10 @@ static int nvme_pci_subsystem_reset(struct nvme_ctrl *ctrl)
        }
 
        writel(NVME_SUBSYS_RESET, dev->bar + NVME_REG_NSSR);
-       nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
+
+       if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING) ||
+           !nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
+               goto unlock;
 
        /*
         * Read controller status to flush the previous write and trigger a

Thanks,
--Nilay

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ