linux-kernel - Re: [PATCH v2 12/14] nvme-fc: Decouple error recovery from controller reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <df3246bc-a5e7-4835-8d75-ca468b6bcd27@gmail.com>
Date: Tue, 3 Feb 2026 14:49:01 -0800
From: James Smart <jsmart833426@...il.com>
To: Mohamed Khalfella <mkhalfella@...estorage.com>,
 Justin Tee <justin.tee@...adcom.com>,
 Naresh Gottumukkala <nareshgottumukkala83@...il.com>,
 Paul Ely <paul.ely@...adcom.com>, Chaitanya Kulkarni <kch@...dia.com>,
 Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
 Keith Busch <kbusch@...nel.org>, Sagi Grimberg <sagi@...mberg.me>
Cc: Aaron Dailey <adailey@...estorage.com>,
 Randy Jennings <randyj@...estorage.com>,
 Dhaval Giani <dgiani@...estorage.com>, Hannes Reinecke <hare@...e.de>,
 linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
 jsmart833426@...il.com
Subject: Re: [PATCH v2 12/14] nvme-fc: Decouple error recovery from controller
 reset

On 2/3/2026 11:19 AM, James Smart wrote:
> On 1/30/2026 2:34 PM, Mohamed Khalfella wrote:
...
>>   static void
>>   nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
>>   {
>> @@ -2049,9 +2061,8 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
>>           nvme_fc_complete_rq(rq);
>>   check_error:
>> -    if (terminate_assoc &&
>> -        nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_RESETTING)
>> -        queue_work(nvme_reset_wq, &ctrl->ioerr_work);
>> +    if (terminate_assoc)
>> +        nvme_fc_start_ioerr_recovery(ctrl, "io error");
> 
> this is ok. the ioerr_recovery will bounce the RESETTING state if it's 
> already in the state. So this is a little cleaner.a

What is problematic here is - if the start_ioerr path includes the 
CONNECTING logic that terminates i/o's, it's running in the LLDD's 
context that called this iodone routine. Not good. In existing code, the 
LLDD context was swapped to the work queue where error_recovery was called.

> 
>>   }
>>   static int
>> @@ -2495,39 +2506,6 @@ __nvme_fc_abort_outstanding_ios(struct 
>> nvme_fc_ctrl *ctrl, bool start_queues)
>>           nvme_unquiesce_admin_queue(&ctrl->ctrl);
>>   }
>> -static void
>> -nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
>> -{
>> -    enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
>> -
>> -    /*
>> -     * if an error (io timeout, etc) while (re)connecting, the remote
>> -     * port requested terminating of the association (disconnect_ls)
>> -     * or an error (timeout or abort) occurred on an io while creating
>> -     * the controller.  Abort any ios on the association and let the
>> -     * create_association error path resolve things.
>> -     */
>> -    if (state == NVME_CTRL_CONNECTING) {
>> -        __nvme_fc_abort_outstanding_ios(ctrl, true);
>> -        dev_warn(ctrl->ctrl.device,
>> -            "NVME-FC{%d}: transport error during (re)connect\n",
>> -            ctrl->cnum);
>> -        return;
>> -    }
> 
> This logic needs to be preserved. Its no longer part of 
> nvme_fc_start_ioerr_recovery(). Failures during CONNECTING should not be 
> "fenced". They should fail immediately.

this logic, if left in start_ioerr_recovery


-- james