lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9763c4cf-8ca5-45d4-b723-270548ca1001@suse.de>
Date: Mon, 28 Apr 2025 15:21:18 +0200
From: Hannes Reinecke <hare@...e.de>
To: Daniel Wagner <dwagner@...e.de>, Guenter Roeck <linux@...ck-us.net>
Cc: Daniel Wagner <wagi@...nel.org>, Keith Busch <kbusch@...nel.org>,
 Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
 Sagi Grimberg <sagi@...mberg.me>, James Smart <james.smart@...adcom.com>,
 Shinichiro Kawasaki <shinichiro.kawasaki@....com>,
 linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] nvme: only allow entering LIVE from CONNECTING state

On 4/28/25 14:44, Daniel Wagner wrote:
> On Sun, Apr 27, 2025 at 08:59:13AM -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On Fri, Feb 14, 2025 at 09:02:03AM +0100, Daniel Wagner wrote:
>>> The fabric transports and also the PCI transport are not entering the
>>> LIVE state from NEW or RESETTING. This makes the state machine more
>>> restrictive and allows to catch not supported state transitions, e.g.
>>> directly switching from RESETTING to LIVE.
>>>
>>> Signed-off-by: Daniel Wagner <wagi@...nel.org>
>>
>> nvme_handle_aen_notice(), when handling NVME_AER_NOTICE_FW_ACT_STARTING,
>> sets the state to RESETTING and and triggers a worker. This worker
>> waits for firmware activation to complete and then tries to set the
>> state back to LIVE. This step now fails.
>>
>> Possibly the handling of NVME_AER_NOTICE_FW_ACT_STARTING needs to be
>> improved. However, leaving the NVME in RESETTING state after an
>> NVME_AER_NOTICE_FW_ACT_STARTING event is worse.
>>
>> I think this patch should be reverted at least for the time being until
>> the handling of NVME_AER_NOTICE_FW_ACT_STARTING no longer relies on a
>> direct state change from RESETTING to LIVE.
> 
> ee59e3820ca9 ("nvme-fc: do not ignore connectivity loss during connecting")
> f13409bb3f91 ("nvme-fc: rely on state transitions to handle connectivity loss")
> 
> are depending on the fact that is not possible to switch from
> NEW/RESETTING directly into LIVE.
> 
> I think it would be better to fix the worker instead dropping this patch
> and the above fix for the fc transport.
> 
> What about:
> 
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index b502ac07483b..d3c4eacf607f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4493,7 +4493,8 @@ static void nvme_fw_act_work(struct work_struct *work)
>                  msleep(100);
>          }
> 
> -       if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> +       if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING) ||
> +           !nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
>                  return;
> 
>          nvme_unquiesce_io_queues(ctrl);

I would rather have a separate state for firmware activation.
(Ab-)using the 'RESETTING' state here has direct implications
with the error handler, as for the error handler 'RESETTING'
means that the error handler has been scheduled.
Which is not true for firmware activation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@...e.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ