[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260130223531.2478849-1-mkhalfella@purestorage.com>
Date: Fri, 30 Jan 2026 14:34:04 -0800
From: Mohamed Khalfella <mkhalfella@...estorage.com>
To: Justin Tee <justin.tee@...adcom.com>,
Naresh Gottumukkala <nareshgottumukkala83@...il.com>,
Paul Ely <paul.ely@...adcom.com>,
Chaitanya Kulkarni <kch@...dia.com>,
Christoph Hellwig <hch@....de>,
Jens Axboe <axboe@...nel.dk>,
Keith Busch <kbusch@...nel.org>,
Sagi Grimberg <sagi@...mberg.me>
Cc: Aaron Dailey <adailey@...estorage.com>,
Randy Jennings <randyj@...estorage.com>,
Dhaval Giani <dgiani@...estorage.com>,
Hannes Reinecke <hare@...e.de>,
linux-nvme@...ts.infradead.org,
linux-kernel@...r.kernel.org,
Mohamed Khalfella <mkhalfella@...estorage.com>
Subject: [PATCH v2 00/14] TP8028 Rapid Path Failure Recovery
This patchset adds support for TP8028 Rapid Path Failure Recovery for
both nvme target and initiator. Rapid Path Failure Recovery brings
Cross-Controller Reset (CCR) functionality to nvme. This allows nvme
host to send an nvme command to source nvme controller to reset impacted
nvme controller. Provided that both source and impacted controllers are
in the same nvme subsystem.
The main use of CCR is when one path to nvme subsystem fails. Inflight
IOs on impacted nvme controller need to be terminated first before they
can be retried on another path. Otherwise data corruption may happen.
CCR provides a quick way to terminate these IOs on the unreachable nvme
controller allowing recovery to move quickly and avoiding unnecessary
delays. In case of CCR is not possible, then inflight requests are held
for duration defined by TP4129 KATO Corrections and Clarifications
before they are allowed to be retried.
On the target side:
* New struct members have been added to support CCR. struct nvme_id_ctrl
has been updated with CIU (Controller Instance Uniquifier), CIRN
(Controller Instance Random Number), and CQT (Command Quiesce Time).
The combination of CIU, CNTLID, and CIRN is used to identify impacted
controller in CCR command.
* CCR nvme command implemented on the target causes impacted controller
to fail and drop connections to host.
* CCR logpage contains the status of pending CCR requests. An entry is
added to the logpage after CCR request is validated. Completed CCR
requests are removed from the logpage when controller becomes ready or
when requested in get logpage command.
* An AEN is sent when CCR completes to let the host know that it is safe
to retry inflight requests.
On the host side:
* CIU, CIRN, and CQT have been added to struct nvme_ctrl. CIU and CIRN
have been added to sysfs to make the values visible to user. CIU and
CIRN can be used to construct and manually send admin-passthru CCR
commands.
* New controller state NVME_CTRL_RECOVERING has been added to prevent
cancelling timed out inflight requests while CCR is in progress.
Controller flag NVME_CTRL_RECOVERED was also added to signal end of
time-based recovery.
* Controller recovery in nvme_recover_ctrl() is invoked when LIVE
controller hits an error or when a request times out. CCR is attempted
to reset impacted controller.
* Updated nvme fabric transports nvme-tcp, nvme-rdma, and nvme-fc to use
CCR recovery.
Ideally all inflight requests should be held during controller recovery
and only retried after recovery is done. However, there are known
situations that is not the case in this implementation. These gaps will
be addressed in future patches:
* Manual controller reset from sysfs will result in controller going to
RESETTING state and all inflight requests to be canceled immediately
and maybe retried on another path.
* Manual controller delete from sysfs will also result in all inflight
requests to be canceled immediately and maybe retried on another path.
* In nvme-fc nvme controller will be deleted if remote port disappears
with no timeout specified. This results in immediate cancellation of
requests that maybe retried on another path.
* In nvme-rdma if HCA is removed all nvme controllers will be deleted.
This results in canceling inflight IOs and maybe they will be retred
on another path.
Changes from v1:
* nvmet: Rapid Path Failure Recovery set controller identify fields
- Added subsys->cqt defaults to 0 to maintain current behavior.
- subsys->cqt is configurable via configfs
- Added ctrl->cqt initialized from subsys->cqt.
- Renamed ctrl->uniquifier to ctrl->ciu, ctrl->random to ctrl->cirn.
* nvmet: Implement CCR nvme command
- Refactored nvmet_execute_cross_ctrl_reset() for simpler error handling
- Renamed CCR list from ctrl->ccrs to ctrl->ccr_list.
* nvmet: Implement CCR logpage
- Added CCR status and flags enums
* nvme: Rapid Path Failure Recovery read controller identify fields
- Renamed ctrl sysfs attributes uniquifier -> ciu, random -> cirn
* nvme: Introduce FENCING and FENCED controller states
- Added two states (FENCING and FENCED) instead of (RECOVERING and
controller flag RECOVERED)
- Updated __nvme_check_ready() such that fabric controller in FENCING
state is not ready to send requests. Also a request sent while
controller in FENCING state is completed with host path error
instead of returning BLK_STS_RESOURCE.
* nvme: Implement cross-controller reset recovery
- Renamed nvme_find_ccr_ctrl() to *nvme_find_ctrl_ccr() to pair with
newly added nvme_put_ctrl_ccr(). The later handles releasing source
controller used to issue CCR command.
- Renamed nvme_recover_ctrl() to nvme_fence_ctrl().
- Deleted nvme_end_ctrl_recovery() because the state change has been
moved to nvme_change_ctrl_state().
- Renamed CCR list from ctrl->ccrs to ctrl->ccr_list.
* nvme-tcp: Use CCR to recover controller that hits an error
- Added ctrl->fencing_work and ctrl->fenced_work instead of changing
ctrl->err_work and using it for fencing purpose.
* nvme-rdma: Use CCR to recover controller that hits an error
- Similar change to nvme-tcp.
* nvme-fc: Use CCR to recover controller that hits an error
- Similar to nvme-rdma and nvme-tcp.
* nvme-fc: Hold inflight requests while in RECOVERING state
- Updated nvme_fc_fcpio_done() to hold the first request that starts
error recovery. That was one of the limitations mentioned in the cover
letter of v1.
v1: https://lore.kernel.org/all/20251126021250.2583630-1-mkhalfella@purestorage.com/
Mohamed Khalfella (14):
nvmet: Rapid Path Failure Recovery set controller identify fields
nvmet/debugfs: Add ctrl uniquifier and random values
nvmet: Implement CCR nvme command
nvmet: Implement CCR logpage
nvmet: Send an AEN on CCR completion
nvme: Rapid Path Failure Recovery read controller identify fields
nvme: Introduce FENCING and FENCED controller states
nvme: Implement cross-controller reset recovery
nvme: Implement cross-controller reset completion
nvme-tcp: Use CCR to recover controller that hits an error
nvme-rdma: Use CCR to recover controller that hits an error
nvme-fc: Decouple error recovery from controller reset
nvme-fc: Use CCR to recover controller that hits an error
nvme-fc: Hold inflight requests while in FENCING state
drivers/nvme/host/constants.c | 1 +
drivers/nvme/host/core.c | 208 +++++++++++++++++++++++++++++-
drivers/nvme/host/fc.c | 215 ++++++++++++++++++++++----------
drivers/nvme/host/nvme.h | 25 ++++
drivers/nvme/host/rdma.c | 62 ++++++++-
drivers/nvme/host/sysfs.c | 25 ++++
drivers/nvme/host/tcp.c | 62 ++++++++-
drivers/nvme/target/admin-cmd.c | 124 ++++++++++++++++++
drivers/nvme/target/configfs.c | 31 +++++
drivers/nvme/target/core.c | 108 +++++++++++++++-
drivers/nvme/target/debugfs.c | 21 ++++
drivers/nvme/target/nvmet.h | 20 ++-
include/linux/nvme.h | 70 ++++++++++-
13 files changed, 897 insertions(+), 75 deletions(-)
base-commit: 8dfce8991b95d8625d0a1d2896e42f93b9d7f68d
--
2.52.0
Powered by blists - more mailing lists