linux-kernel - Re: [RFC PATCH 08/14] nvme: Implement cross-controller reset recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPpK+O0dmysf6HxQN68q-JzSm-YATXQ1ZUcz-D56-+4WE-Aj7Q@mail.gmail.com>
Date: Tue, 30 Dec 2025 16:04:05 -0800
From: Randy Jennings <randyj@...estorage.com>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Mohamed Khalfella <mkhalfella@...estorage.com>, Chaitanya Kulkarni <kch@...dia.com>, 
	Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>, Keith Busch <kbusch@...nel.org>, 
	Aaron Dailey <adailey@...estorage.com>, John Meneghini <jmeneghi@...hat.com>, 
	Hannes Reinecke <hare@...e.de>, linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 08/14] nvme: Implement cross-controller reset recovery

> > +
> > +             if (!ret) {
> > +                     dev_info(ictrl->device, "CCR succeeded using %s\n",
> > +                              dev_name(sctrl->device));
> > +                     blk_put_queue(sctrl->admin_q);
> > +                     nvme_put_ctrl(sctrl);
> > +                     return 0;
> > +             }
> > +
> > +             /* Try another controller */
> > +             min_cntlid = sctrl->cntlid + 1;
>
> OK, I see why min_cntlid is used. That is very non-intuitive.
>
> I'm wandering if it will be simpler to take one-shot at ccr and
> if it fails fallback to crt. I mean, if the sctrl is alive, and it was
> unable
> to reset the ictrl in time, how would another ctrl do a better job here?
There are many different kinds of failures we are dealing with here
that result in a dropped connection (association).  It could be a problem
with the specific link, or it could be that the node of an HA pair in the
storage array went down.  In the case of a specific link problem, maybe
only one of the connections is down and any controller would work.
In the case of the node of an HA pair, roughly half of the connections
are going down, and there is a race between the controllers which
are detected down first.  There were some heuristics put into the
spec about deciding which controller to use, but that is more code
and a refinement that could come later (and they are still heuristics;
they may not be helpful).

Because CCR offers a significant win of shortening the recovery time
substantially, it is worth retrying on the other controllers. This time
affects when we can start retrying IO.  KATO is in seconds, and
NVMEoF should have the capability of doing a significant amount of
IOs in each of those seconds.

Besides, the alternative is just to wait.  Might as well be actively trying
to shorten that wait time.  Besides a small increase in code complexity,
is there a downside to doing so?

Sincerely,
Randy Jennings
the time.