lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250430012915.295136-1-lizhijian@fujitsu.com>
Date: Wed, 30 Apr 2025 09:29:15 +0800
From: Li Zhijian <lizhijian@...itsu.com>
To: linux-cxl@...r.kernel.org
Cc: Jonathan Cameron <jonathan.cameron@...wei.com>,
	Dave Jiang <dave.jiang@...el.com>,
	Alison Schofield <alison.schofield@...el.com>,
	Vishal Verma <vishal.l.verma@...el.com>,
	Ira Weiny <ira.weiny@...el.com>,
	Dan Williams <dan.j.williams@...el.com>,
	linux-kernel@...r.kernel.org,
	Li Zhijian <lizhijian@...itsu.com>
Subject: [RFC PATCH] cxl: Allow reprogramming misconfigured hdm decoders

During kernel booting, CXL drivers will attempt to construct the CXL region
according to the pre-programed(firmware provisioning) HDM decoders.

This construction process will fail for some reasons, in this case, the
userspace cli like ndctl/cxl cannot destroy nor create regions upon the
existing decoders.

Introuce a new flag CXL_DECODER_F_NEED_RESET tell the driver to reset
the decoder during `cxl destroy-region regionN`, so that region can be
create again after that.

Signed-off-by: Li Zhijian <lizhijian@...itsu.com>
---
Cover letter is here.

Hi all,

Previously, we encountered a CXL device with differing IG between the device
and its USP, which resulted in the driver’s failure to automatically
complete the creation of the region, and user cannot create/destroy region
after this failure.

Although this IG issue has been ignored in the new kernel [1], I realized
that if there was an error in firmware provisioning the HDM decoders, the
user might not be able to destroy the unfinished region and recreate it.
This is because certain components related to this region are in an
inconsistent state, preventing the CXL tool from operating on it.

This implies that the OS administrator cannot reconfigure the CXL device,
which is largely contrary to user expectations. I am keen to hear your
thoughts on this matter.

A modified QEMU [2] is able to limitly program the HDM decoders to inject some
wrong decoder configurations.

[1] https://lore.kernel.org/lkml/20250402232552.999634-1-gourry@gourry.net/
[2] https://github.com/zhijianli88/qemu/tree/program-decoder

---
---
 drivers/cxl/core/hdm.c       | 4 +++-
 drivers/cxl/core/region.c    | 6 ++++--
 drivers/cxl/cxl.h            | 3 ++-
 tools/testing/cxl/test/cxl.c | 1 +
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 70cae4ebf8a4..afbbda780d4d 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -559,7 +559,8 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
 			dev_name(&cxled->cxld.region->dev));
 		return -EBUSY;
 	}
-	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) {
+	if (cxled->cxld.flags & CXL_DECODER_F_ENABLE &&
+	    !(cxled->cxld.flags & CXL_DECODER_F_NEED_RESET)) {
 		dev_dbg(dev, "decoder enabled\n");
 		return -EBUSY;
 	}
@@ -918,6 +919,7 @@ static void cxl_decoder_reset(struct cxl_decoder *cxld)
 	up_read(&cxl_dpa_rwsem);
 
 	cxld->flags &= ~CXL_DECODER_F_ENABLE;
+	cxld->flags &= ~CXL_DECODER_F_NEED_RESET;
 
 	/* Userspace is now responsible for reconfiguring this decoder */
 	if (is_endpoint_decoder(&cxld->dev)) {
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c3f4dc244df7..d025e892d07d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2096,7 +2096,8 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
 	p = &cxlr->params;
 	get_device(&cxlr->dev);
 
-	if (p->state > CXL_CONFIG_ACTIVE) {
+	if (p->state > CXL_CONFIG_ACTIVE ||
+	    cxled->cxld.flags & CXL_DECODER_F_NEED_RESET) {
 		cxl_region_decode_reset(cxlr, p->interleave_ways);
 		p->state = CXL_CONFIG_ACTIVE;
 	}
@@ -3434,7 +3435,8 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
 		if (device_attach(&cxlr->dev) < 0)
 			dev_err(&cxlr->dev, "failed to enable, range: %pr\n",
 				p->res);
-	}
+	} else
+		cxled->cxld.flags |= CXL_DECODER_F_NEED_RESET;
 
 	put_device(region_dev);
 out:
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index be8a7dc77719..60fae072bbcf 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -331,7 +331,8 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
 #define CXL_DECODER_F_TYPE3 BIT(3)
 #define CXL_DECODER_F_LOCK  BIT(4)
 #define CXL_DECODER_F_ENABLE    BIT(5)
-#define CXL_DECODER_F_MASK  GENMASK(5, 0)
+#define CXL_DECODER_F_NEED_RESET    BIT(6)
+#define CXL_DECODER_F_MASK  GENMASK(6, 0)
 
 enum cxl_decoder_type {
 	CXL_DECODER_DEVMEM = 2,
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 1c3336095923..5dac454ca0e2 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -709,6 +709,7 @@ static void mock_decoder_reset(struct cxl_decoder *cxld)
 			"%s: out of order reset, expected decoder%d.%d\n",
 			dev_name(&cxld->dev), port->id, port->commit_end);
 	cxld->flags &= ~CXL_DECODER_F_ENABLE;
+	cxld->flags &= ~CXL_DECODER_F_NEED_RESET;
 }
 
 static void default_mock_decoder(struct cxl_decoder *cxld)
-- 
2.27.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ