[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPpK+O152NEqCrzzLEsUiDiO=CS6OYLfeZ4RN-KGVSH2XTXMOA@mail.gmail.com>
Date: Tue, 6 Jan 2026 19:16:36 -0800
From: Randy Jennings <randyj@...estorage.com>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Mohamed Khalfella <mkhalfella@...estorage.com>, Chaitanya Kulkarni <kch@...dia.com>,
Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>, Keith Busch <kbusch@...nel.org>,
Aaron Dailey <adailey@...estorage.com>, John Meneghini <jmeneghi@...hat.com>,
Hannes Reinecke <hare@...e.de>, linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 08/14] nvme: Implement cross-controller reset recovery
On Sun, Jan 4, 2026 at 1:14 PM Sagi Grimberg <sagi@...mberg.me> wrote:
> On 31/12/2025 2:04, Randy Jennings wrote:
> >>> +
> >>> + if (!ret) {
> >>> + dev_info(ictrl->device, "CCR succeeded using %s\n",
> >>> + dev_name(sctrl->device));
> >>> + blk_put_queue(sctrl->admin_q);
> >>> + nvme_put_ctrl(sctrl);
> >>> + return 0;
> >>> + }
> >>> +
> >>> + /* Try another controller */
> >>> + min_cntlid = sctrl->cntlid + 1;
> >> OK, I see why min_cntlid is used. That is very non-intuitive.
> >>
> >> I'm wandering if it will be simpler to take one-shot at ccr and
> >> if it fails fallback to crt. I mean, if the sctrl is alive, and it was
> >> unable
> >> to reset the ictrl in time, how would another ctrl do a better job here?
> > There are many different kinds of failures we are dealing with here
> > that result in a dropped connection (association). It could be a problem
> > with the specific link, or it could be that the node of an HA pair in the
> > storage array went down. In the case of a specific link problem, maybe
> > only one of the connections is down and any controller would work.
> > In the case of the node of an HA pair, roughly half of the connections
> > are going down, and there is a race between the controllers which
> > are detected down first. There were some heuristics put into the
> > spec about deciding which controller to use, but that is more code
> > and a refinement that could come later (and they are still heuristics;
> > they may not be helpful).
> >
> > Because CCR offers a significant win of shortening the recovery time
> > substantially, it is worth retrying on the other controllers. This time
> > affects when we can start retrying IO. KATO is in seconds, and
> > NVMEoF should have the capability of doing a significant amount of
> > IOs in each of those seconds.
>
> But it doesn't actually do I/O, it issues I/O and then wait for it to
> time out.
Retrying CCR does not actually do I/O (trying to place your antecedent),
but a successful CCR allows the host to get back to doing I/O. Every
second saved can be a significant amount of I/O. If you were given a
choice between a 1 second failover and a 60 second failover, of course,
you would go for the 1 second failover. However, if I was given the
option of a 10 second failover and a 60 second failover, I would still
go for the 10 second failover. 50 seconds is still extremely valuable.
>
> >
> > Besides, the alternative is just to wait. Might as well be actively trying
> > to shorten that wait time. Besides a small increase in code complexity,
> > is there a downside to doing so?
>
> Simplicity is very important when it comes to non-trivial code paths
> like error recovery.
Okay, yes, unwarranted complexity, even with some benefit might not
be worth it. I can see that my comment could be taken as flippant. But
the extra complexity here yields an important and material benefit.
Sincerely,
Randy Jennings
Powered by blists - more mailing lists