[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9763c4cf-8ca5-45d4-b723-270548ca1001@suse.de>
Date: Mon, 28 Apr 2025 15:21:18 +0200
From: Hannes Reinecke <hare@...e.de>
To: Daniel Wagner <dwagner@...e.de>, Guenter Roeck <linux@...ck-us.net>
Cc: Daniel Wagner <wagi@...nel.org>, Keith Busch <kbusch@...nel.org>,
Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>, James Smart <james.smart@...adcom.com>,
Shinichiro Kawasaki <shinichiro.kawasaki@....com>,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] nvme: only allow entering LIVE from CONNECTING state
On 4/28/25 14:44, Daniel Wagner wrote:
> On Sun, Apr 27, 2025 at 08:59:13AM -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On Fri, Feb 14, 2025 at 09:02:03AM +0100, Daniel Wagner wrote:
>>> The fabric transports and also the PCI transport are not entering the
>>> LIVE state from NEW or RESETTING. This makes the state machine more
>>> restrictive and allows to catch not supported state transitions, e.g.
>>> directly switching from RESETTING to LIVE.
>>>
>>> Signed-off-by: Daniel Wagner <wagi@...nel.org>
>>
>> nvme_handle_aen_notice(), when handling NVME_AER_NOTICE_FW_ACT_STARTING,
>> sets the state to RESETTING and and triggers a worker. This worker
>> waits for firmware activation to complete and then tries to set the
>> state back to LIVE. This step now fails.
>>
>> Possibly the handling of NVME_AER_NOTICE_FW_ACT_STARTING needs to be
>> improved. However, leaving the NVME in RESETTING state after an
>> NVME_AER_NOTICE_FW_ACT_STARTING event is worse.
>>
>> I think this patch should be reverted at least for the time being until
>> the handling of NVME_AER_NOTICE_FW_ACT_STARTING no longer relies on a
>> direct state change from RESETTING to LIVE.
>
> ee59e3820ca9 ("nvme-fc: do not ignore connectivity loss during connecting")
> f13409bb3f91 ("nvme-fc: rely on state transitions to handle connectivity loss")
>
> are depending on the fact that is not possible to switch from
> NEW/RESETTING directly into LIVE.
>
> I think it would be better to fix the worker instead dropping this patch
> and the above fix for the fc transport.
>
> What about:
>
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index b502ac07483b..d3c4eacf607f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4493,7 +4493,8 @@ static void nvme_fw_act_work(struct work_struct *work)
> msleep(100);
> }
>
> - if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> + if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING) ||
> + !nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> return;
>
> nvme_unquiesce_io_queues(ctrl);
I would rather have a separate state for firmware activation.
(Ab-)using the 'RESETTING' state here has direct implications
with the error handler, as for the error handler 'RESETTING'
means that the error handler has been scheduled.
Which is not true for firmware activation.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@...e.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
Powered by blists - more mailing lists