lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc2e03ea-5404-e768-0cab-e95adcde6da7@marcan.st>
Date:   Wed, 11 Jan 2023 14:10:42 +0900
From:   Hector Martin <marcan@...can.st>
To:     Christoph Hellwig <hch@....de>
Cc:     Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Eric Curtin <ecurtin@...hat.com>, Janne Grunau <j@...nau.net>,
        Sven Peter <sven@...npeter.dev>,
        Alyssa Rosenzweig <alyssa@...enzweig.io>,
        asahi@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org,
        linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] nvme-apple: Do not try to shut down the controller
 twice

On 11/01/2023 13.54, Christoph Hellwig wrote:
> On Wed, Jan 11, 2023 at 01:36:13PM +0900, Hector Martin wrote:
>> The blamed commit stopped explicitly disabling the controller when we do
>> a controlled shutdown, but apple_nvme_reset_work was only checking for
>> the disable bit before deciding to issue another disable. Check for the
>> shutdown state too, to avoid breakage.
>>
>> This issue does not affect nvme-pci, since it only issues controller
>> shutdowns when the system is actually shutting down anyway.
> 
> There's a few other places where nvme-pci does a shutdown like
> probe/reset failure and most notably and mostly notably various
> power management scenarios.
> 
> What path is causing a problem here for nvme-apple?  I fear we're
> missing some highler level check here and getting further out of
> sync.
> 

OK, so the first question is who is responsible for resetting the
controller after a shutdown? The spec requires a reset in order to bring
it back up from that state. Indeed the PCIe code does an explicit
disable right now (though, judging by the comment, it probably wasn't
done with the intent of resetting after a shutdown, it just happens to
work for that too :))

Right now, apple_nvme_reset_work() tries to check for the condition of
an enabled controller (under the assumption that it's coming from a live
controller, not considering shutdown/sleep) and issue an
apple_nvme_disable(). However, this doesn't work when resuming because
at that point the firmware coprocessor is shut down, so the device isn't
usable (can't even get a disable command to complete properly). Perhaps
a better conditional here would be to check for
apple_rtkit_is_running(), since apple_nvme_disable() can't work otherwise.

But then, even if we get past that, once apple_nvme_reset_work actually
resets the firmware CPU and kicks things back online, the controller is
still logically in the shutdown state. So something has to disable it in
order for nvme_enable_ctrl() to work.

An alternate patch would be to change the gate for apple_nvme_disable()
in apple_nvme_reset_work() to check for apple_rtkit_is_running() on top
of the controller enable state, and then add a further direct call to
nvme_disable_ctrl(..., false) later in apple_nvme_reset_work, once the
firmware is back up, to ensure we can enable it after. Thoughts?

Alternatively, we could just revert to the prior behavior of always
issuing a disable after a shutdown. We need to disable at some point to
come back anyway, so it might as well be done early (before we shut down
firmware, so it works).

- Hector

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ