lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251126021250.2583630-8-mkhalfella@purestorage.com>
Date: Tue, 25 Nov 2025 18:11:54 -0800
From: Mohamed Khalfella <mkhalfella@...estorage.com>
To: Chaitanya Kulkarni <kch@...dia.com>,
	Christoph Hellwig <hch@....de>,
	Jens Axboe <axboe@...nel.dk>,
	Keith Busch <kbusch@...nel.org>,
	Sagi Grimberg <sagi@...mberg.me>
Cc: Aaron Dailey <adailey@...estorage.com>,
	Randy Jennings <randyj@...estorage.com>,
	John Meneghini <jmeneghi@...hat.com>,
	Hannes Reinecke <hare@...e.de>,
	linux-nvme@...ts.infradead.org,
	linux-kernel@...r.kernel.org,
	Mohamed Khalfella <mkhalfella@...estorage.com>
Subject: [RFC PATCH 07/14] nvme: Add RECOVERING nvme controller state

Add NVME_CTRL_RECOVERING as a new controller state to be used when
impacted controller is being recovered. A LIVE controller enters
RECOVERING state when an IO error is encountered. While recovering
inflight IOs will not be canceled if they timeout. These IOs will be
canceled after recovery finishes. Also, while recovering a controller
can not be reset or deleted. This is intentional because reset or delete
will result in canceling inflight IOs. When recovery finishes, the
impacted controller transitions from RECOVERING state to RESETTING state.
Reset codepath takes care of queues teardown and inflight requests
cancellation.

Note, there is no transition from RECOVERING to RESETTING added to
nvme_change_ctrl_state(). The reason is that user should not be allowed
to reset or delete a controller that is being recovered.

Add NVME_CTRL_RECOVERED controller flag. This flag is set on a controller
about to schedule delayed work for time based recovery.

Signed-off-by: Mohamed Khalfella <mkhalfella@...estorage.com>
---
 drivers/nvme/host/core.c  | 10 ++++++++++
 drivers/nvme/host/nvme.h  |  2 ++
 drivers/nvme/host/sysfs.c |  1 +
 3 files changed, 13 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index aa007a7b9606..f5b84bc327d3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -574,6 +574,15 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 			break;
 		}
 		break;
+	case NVME_CTRL_RECOVERING:
+		switch (old_state) {
+		case NVME_CTRL_LIVE:
+			changed = true;
+			fallthrough;
+		default:
+			break;
+		}
+		break;
 	case NVME_CTRL_RESETTING:
 		switch (old_state) {
 		case NVME_CTRL_NEW:
@@ -761,6 +770,7 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
 	if (state != NVME_CTRL_DELETING_NOIO &&
 	    state != NVME_CTRL_DELETING &&
 	    state != NVME_CTRL_DEAD &&
+	    state != NVME_CTRL_RECOVERING &&
 	    !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
 	    !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
 		return BLK_STS_RESOURCE;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5195a9abfadf..cde427353e0a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -251,6 +251,7 @@ static inline u16 nvme_req_qid(struct request *req)
 enum nvme_ctrl_state {
 	NVME_CTRL_NEW,
 	NVME_CTRL_LIVE,
+	NVME_CTRL_RECOVERING,
 	NVME_CTRL_RESETTING,
 	NVME_CTRL_CONNECTING,
 	NVME_CTRL_DELETING,
@@ -275,6 +276,7 @@ enum nvme_ctrl_flags {
 	NVME_CTRL_SKIP_ID_CNS_CS	= 4,
 	NVME_CTRL_DIRTY_CAPABILITY	= 5,
 	NVME_CTRL_FROZEN		= 6,
+	NVME_CTRL_RECOVERED		= 7,
 };
 
 struct nvme_ctrl {
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index ae36249ad61e..55f907fb6c86 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -443,6 +443,7 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
 	static const char *const state_name[] = {
 		[NVME_CTRL_NEW]		= "new",
 		[NVME_CTRL_LIVE]	= "live",
+		[NVME_CTRL_RECOVERING]	= "recovering",
 		[NVME_CTRL_RESETTING]	= "resetting",
 		[NVME_CTRL_CONNECTING]	= "connecting",
 		[NVME_CTRL_DELETING]	= "deleting",
-- 
2.51.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ