linux-kernel - Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <99800ff42431d59a15401fdf33662581d9b7b563.camel@linux.ibm.com>
Date:   Tue, 12 Jul 2022 16:05:31 +0200
From:   Niklas Schnelle <schnelle@...ux.ibm.com>
To:     Hannes Reinecke <hare@...e.de>, Christoph Hellwig <hch@....de>,
        Keith Busch <kbusch@...nel.org>
Cc:     Stefan Roese <sr@...x.de>, Matthew Rosato <mjrosato@...ux.ibm.com>,
        linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] nvme-pci: fix hang during error recovery when the
 PCI device is isolated

On Tue, 2022-07-12 at 15:49 +0200, Hannes Reinecke wrote:
> On 7/12/22 14:44, Niklas Schnelle wrote:
> > On s390 and powerpc PCI devices are isolated when an error is detected
> > and driver->err_handler->error_detected is called with an inaccessible
> > PCI device and PCI channel state set to pci_channel_io_frozen
> > (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
> > 
> > In the case of NVMe devices nvme_error_detected() then calls
> > nvme_dev_disable(dev, false) and requests a reset. After a successful
> > reset the device is accessible again and nvme_slot_reset() resets the
> > controller and queues nvme_reset_work() which then recovers the
> > controller.
> > 
> > Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
> > nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
> > queues if pci_device_is_present() returns false. This is the case for an
> > isolated PCI device. In principle this makes sense as there are no
> > accessible hardware queues to run. The problem though is that for
> > a previously live reset controller with online queues nvme_reset_work()
> > calls nvme_wait_freeze() which, without the freeze having been
> > initiated, then hangs forever. Fix this by starting the freeze in
> > nvme_slot_reset() which is the earliest point where we know the device
> > should be accessible again.
> > 
> > Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
> > Signed-off-by: Niklas Schnelle <schnelle@...ux.ibm.com>
> > ---
> >   drivers/nvme/host/pci.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 193b44755662..7c0c61b74c30 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -3399,6 +3399,7 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
> >   	dev_info(dev->ctrl.device, "restart after slot reset\n");
> >   	pci_restore_state(pdev);
> >   	nvme_reset_ctrl(&dev->ctrl);
> > +	nvme_start_freeze(&dev->ctrl);
> >   	return PCI_ERS_RESULT_RECOVERED;
> >   }
> >   
> I am not sure if that's the right fix.
>  From your description the hang occurs as nvme_reset_ctrl() is calling 
> nvme_wait_freeze() without an corresponding nvme_start_freeze().
> So why are you calling it _after_ the call to nvme_reset_ctrl()?
> 
> Cheers,
> 
> Hannes

Hmm, the call chain that used to have the nvme_start_freeze()
is nvme_error_detected()->nvme_dev_disable()->nvme_start_freeze(). 

With the referenced commit that nvme_start_freeze() no longer happens
because the nvme_error_detected callback occurs before the reset when
the device is still inaccessible (as mentioned Step 1 in
Documentation/PCI/pci-error-recovery.rst).

There is indeed another nvme_dev_disable() call in nvme_reset_work()
but in the nvme_slot_reset() path this comes before
pci_enable_device_mem() was called so that also doesn't do the
nvme_start_freeze(). I also tried doing the nvme_start_freeze() there
but at least in my test that broke /sys/bus/pci/devices/<dev>/reset,
though not entirely sure why since nvme_start_freeze() looks like a no-
op when done a second time. I'm also not sure though what the right
approach is here though.