[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f3c9cf0-7fee-432a-b6c5-44fb2acb0b1d@gmail.com>
Date: Tue, 10 Feb 2026 14:49:15 -0800
From: James Smart <jsmart833426@...il.com>
To: Mohamed Khalfella <mkhalfella@...estorage.com>
Cc: Justin Tee <justin.tee@...adcom.com>,
Naresh Gottumukkala <nareshgottumukkala83@...il.com>,
Paul Ely <paul.ely@...adcom.com>, Chaitanya Kulkarni <kch@...dia.com>,
Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
Keith Busch <kbusch@...nel.org>, Sagi Grimberg <sagi@...mberg.me>,
Aaron Dailey <adailey@...estorage.com>,
Randy Jennings <randyj@...estorage.com>,
Dhaval Giani <dgiani@...estorage.com>, Hannes Reinecke <hare@...e.de>,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 08/14] nvme: Implement cross-controller reset recovery
On 2/10/2026 2:27 PM, Mohamed Khalfella wrote:
> On Tue 2026-02-10 14:09:27 -0800, James Smart wrote:
>> On 1/30/2026 2:34 PM, Mohamed Khalfella wrote:
>> ...
>>> +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl)
>>> +{
>>> + unsigned long deadline, now, timeout;
>>> + struct nvme_ctrl *sctrl;
>>> + u32 min_cntlid = 0;
>>> + int ret;
>>> +
>>> + timeout = nvme_fence_timeout_ms(ictrl);
>>> + dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
>>> +
>>> + now = jiffies;
>>> + deadline = now + msecs_to_jiffies(timeout);
>>> + while (time_before(now, deadline)) {
>>
>> Q: don't we have something to identify the controller's subsystem
>> supports CCR before we starting selecting controllers and sending CCR ?
>>
>> I would think on older devices that don't support it we should be
>> skipping this loop. The loop could delay the Time-Based delay without
>> any CCR.
>
> I do not think we have something that identifies CCR support at
> subsystem level. The spec defines CCRL at the controller level. The loop
> should not that bad. nvme_find_ctrl_ccr() should return NULL if CCR is
> not supported and nvme_fence_ctrl() will return immediately.
>
>>
>> -- james
>>
I would think CCRL on the failed controller would be enough to assume
the subsystem supports it.
I'm not worried about the coding on the host is so bad. It's more the
multiple paths that must have cmds sent to them and getting error
responses for unknown cmds (should be responded to ok, but you never
know) as well as creating conditions for other errors where there will
be no return for it - e.g. other paths losing connectivity while the ccr
outstanding, etc. yes, they all have to work, but why bother adding
these flows to an old controller that would never do CCR ?
-- james
Powered by blists - more mailing lists