lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b897863e04e4e588f1af318e4292739@huawei.com>
Date: Wed, 29 Jan 2025 12:28:10 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Dan Williams <dan.j.williams@...el.com>, "linux-edac@...r.kernel.org"
	<linux-edac@...r.kernel.org>, "linux-cxl@...r.kernel.org"
	<linux-cxl@...r.kernel.org>, "linux-acpi@...r.kernel.org"
	<linux-acpi@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "bp@...en8.de" <bp@...en8.de>, "tony.luck@...el.com"
	<tony.luck@...el.com>, "rafael@...nel.org" <rafael@...nel.org>,
	"lenb@...nel.org" <lenb@...nel.org>, "mchehab@...nel.org"
	<mchehab@...nel.org>, "dave@...olabs.net" <dave@...olabs.net>, "Jonathan
 Cameron" <jonathan.cameron@...wei.com>, "dave.jiang@...el.com"
	<dave.jiang@...el.com>, "alison.schofield@...el.com"
	<alison.schofield@...el.com>, "vishal.l.verma@...el.com"
	<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
	"david@...hat.com" <david@...hat.com>, "Vilas.Sridharan@....com"
	<Vilas.Sridharan@....com>, "leo.duran@....com" <leo.duran@....com>,
	"Yazen.Ghannam@....com" <Yazen.Ghannam@....com>, "rientjes@...gle.com"
	<rientjes@...gle.com>, "jiaqiyan@...gle.com" <jiaqiyan@...gle.com>,
	"Jon.Grimm@....com" <Jon.Grimm@....com>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "naoya.horiguchi@....com"
	<naoya.horiguchi@....com>, "james.morse@....com" <james.morse@....com>,
	"jthoughton@...gle.com" <jthoughton@...gle.com>, "somasundaram.a@....com"
	<somasundaram.a@....com>, "erdemaktas@...gle.com" <erdemaktas@...gle.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "duenwen@...gle.com"
	<duenwen@...gle.com>, "gthelen@...gle.com" <gthelen@...gle.com>,
	"wschwartz@...erecomputing.com" <wschwartz@...erecomputing.com>,
	"dferguson@...erecomputing.com" <dferguson@...erecomputing.com>,
	"wbs@...amperecomputing.com" <wbs@...amperecomputing.com>,
	"nifan.cxl@...il.com" <nifan.cxl@...il.com>, tanxiaofei
	<tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>, "Roberto
 Sassu" <roberto.sassu@...wei.com>, "kangkang.shen@...urewei.com"
	<kangkang.shen@...urewei.com>, wanghuiqiang <wanghuiqiang@...wei.com>,
	Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH v18 15/19] cxl/memfeature: Add CXL memory device patrol
 scrub control feature

>-----Original Message-----
>From: Dan Williams <dan.j.williams@...el.com>
>Sent: 27 January 2025 23:17
>To: Shiju Jose <shiju.jose@...wei.com>; Dan Williams
><dan.j.williams@...el.com>; linux-edac@...r.kernel.org; linux-
>cxl@...r.kernel.org; linux-acpi@...r.kernel.org; linux-mm@...ck.org; linux-
>kernel@...r.kernel.org
>Cc: bp@...en8.de; tony.luck@...el.com; rafael@...nel.org; lenb@...nel.org;
>mchehab@...nel.org; dave@...olabs.net; Jonathan Cameron
><jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>alison.schofield@...el.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>david@...hat.com; Vilas.Sridharan@....com; leo.duran@....com;
>Yazen.Ghannam@....com; rientjes@...gle.com; jiaqiyan@...gle.com;
>Jon.Grimm@....com; dave.hansen@...ux.intel.com;
>naoya.horiguchi@....com; james.morse@....com; jthoughton@...gle.com;
>somasundaram.a@....com; erdemaktas@...gle.com; pgonda@...gle.com;
>duenwen@...gle.com; gthelen@...gle.com;
>wschwartz@...erecomputing.com; dferguson@...erecomputing.com;
>wbs@...amperecomputing.com; nifan.cxl@...il.com; tanxiaofei
><tanxiaofei@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Roberto
>Sassu <roberto.sassu@...wei.com>; kangkang.shen@...urewei.com;
>wanghuiqiang <wanghuiqiang@...wei.com>; Linuxarm
><linuxarm@...wei.com>
>Subject: RE: [PATCH v18 15/19] cxl/memfeature: Add CXL memory device patrol
>scrub control feature
>
>Shiju Jose wrote:
>> Hi Dan,
>>
>> Thanks for the comments.
>>
>> Please find reply inline.
>>
>> Thanks,
>> Shiju
>> >-----Original Message-----
>> >From: Dan Williams <dan.j.williams@...el.com>
>> >Sent: 24 January 2025 20:39
>> >To: Shiju Jose <shiju.jose@...wei.com>; linux-edac@...r.kernel.org;
>> >linux- cxl@...r.kernel.org; linux-acpi@...r.kernel.org;
>> >linux-mm@...ck.org; linux- kernel@...r.kernel.org
>> >Cc: bp@...en8.de; tony.luck@...el.com; rafael@...nel.org;
>> >lenb@...nel.org; mchehab@...nel.org; dan.j.williams@...el.com;
>> >dave@...olabs.net; Jonathan Cameron <jonathan.cameron@...wei.com>;
>> >dave.jiang@...el.com; alison.schofield@...el.com;
>> >vishal.l.verma@...el.com; ira.weiny@...el.com; david@...hat.com;
>> >Vilas.Sridharan@....com; leo.duran@....com;
>Yazen.Ghannam@....com;
>> >rientjes@...gle.com; jiaqiyan@...gle.com; Jon.Grimm@....com;
>> >dave.hansen@...ux.intel.com; naoya.horiguchi@....com;
>> >james.morse@....com; jthoughton@...gle.com;
>somasundaram.a@....com;
>> >erdemaktas@...gle.com; pgonda@...gle.com; duenwen@...gle.com;
>> >gthelen@...gle.com; wschwartz@...erecomputing.com;
>> >dferguson@...erecomputing.com; wbs@...amperecomputing.com;
>> >nifan.cxl@...il.com; tanxiaofei <tanxiaofei@...wei.com>; Zengtao (B)
>> ><prime.zeng@...ilicon.com>; Roberto Sassu <roberto.sassu@...wei.com>;
>> >kangkang.shen@...urewei.com; wanghuiqiang
><wanghuiqiang@...wei.com>;
>> >Linuxarm <linuxarm@...wei.com>; Shiju Jose <shiju.jose@...wei.com>
>> >Subject: Re: [PATCH v18 15/19] cxl/memfeature: Add CXL memory device
>> >patrol scrub control feature
>> >
>> >shiju.jose@ wrote:
>> >> From: Shiju Jose <shiju.jose@...wei.com>
>> >>
>> >> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub
>> >> control feature. The device patrol scrub proactively locates and
>> >> makes corrections to errors in regular cycle.
>> >>
>> >> Allow specifying the number of hours within which the patrol scrub
>> >> must be completed, subject to minimum and maximum limits reported
>> >> by the
>> >device.
>> >> Also allow disabling scrub allowing trade-off error rates against
>> >> performance.
>> >>
>> >> Add support for patrol scrub control on CXL memory devices.
>> >> Register with the EDAC device driver, which retrieves the scrub
>> >> attribute descriptors from EDAC scrub and exposes the sysfs scrub
>> >> control attributes to userspace. For example, scrub control for the
>> >> CXL memory device "cxl_mem0" is exposed in
>> >/sys/bus/edac/devices/cxl_mem0/scrubX/.
>> >>
>> >> Additionally, add support for region-based CXL memory patrol scrub
>control.
>> >> CXL memory regions may be interleaved across one or more CXL memory
>> >> devices. For example, region-based scrub control for "cxl_region1"
>> >> is exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
>> >>
>> >> Reviewed-by: Dave Jiang <dave.jiang@...el.com>
>> >> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> >> Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
>> >> ---
>> >>  Documentation/edac/scrub.rst  |  66 ++++++
>> >>  drivers/cxl/Kconfig           |  17 ++
>> >>  drivers/cxl/core/Makefile     |   1 +
>> >>  drivers/cxl/core/memfeature.c | 392
>> >++++++++++++++++++++++++++++++++++
>> >>  drivers/cxl/core/region.c     |   6 +
>> >>  drivers/cxl/cxlmem.h          |   7 +
>> >>  drivers/cxl/mem.c             |   5 +
>> >>  include/cxl/features.h        |  16 ++
>> >>  8 files changed, 510 insertions(+)  create mode 100644
>> >> drivers/cxl/core/memfeature.c diff --git
>> >> a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst index
>> >> f86645c7f0af..80e986c57885 100644
>> >> --- a/Documentation/edac/scrub.rst
>> >> +++ b/Documentation/edac/scrub.rst
>[..]
>> >
>> >What is this content-free blob of cat and echo statements? Please
>> >write actual documentation with theory of operation, clarification of
>> >assumptions, rationale for defaults, guidance on changing defaults...
>>
>> Jonathan already replied.
>
>I disagree that any of that is useful to include without rationale, and if the
>rationale is already somewhere else then delete the multiple lines of showing
>how 'cat' and 'echo' work with sysfs.
I will discuss with Jonathan on this how to modify. 

>
>[..]
>> >> +	depends on CXL_MEM
>> >
>> >Similar comment, and this also goes away if all of this just moves
>> >into the new cxl_features driver.
>>
>> Agree with  Jonathan told in reply. These are RAS specific features
>> for CXL memory devices and thus added in memfeature.c
>
>Apoligies for this comment, I had meant to delete it along with some other
>commentary along this theme after thinking it through.
>
>I am now advocating that Dave drop his cxl_features driver altogether and
>mirror your approach. I.e. EDAC is registered from existing CXL drivers, and
>FWCTL can be registered against a cxl_memdev just like the fw_upload ABI.
>
>There was a concern that CXL needed a separate FWCTL driver in case
>distributions wanted to have a policy against FWCTL, but given CXL already has
>CONFIG_CXL_MEM_RAW_COMMANDS at compile-time and a wide array of CXL
>bus devices, a cxl_features device is an awkward fit.
Ok. 
>
>[..]
>> >> +static int cxl_ps_get_attrs(struct cxl_patrol_scrub_context *cxl_ps_ctx,
>> >> +			    struct cxl_memdev_ps_params *params) {
>> >> +	struct cxl_memdev *cxlmd;
>> >> +	u16 min_scrub_cycle = 0;
>> >> +	int i, ret;
>> >> +
>> >> +	if (cxl_ps_ctx->cxlr) {
>> >> +		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
>> >> +		struct cxl_region_params *p = &cxlr->params;
>> >> +
>> >> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
>> >> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
>> >
>> >It looks like this is called directly as a callback from EDAC. Where
>> >is the locking that keeps cxl_ps_ctx->cxlr valid, or p->targets content stable?
>> Jonathan already replied.
>
>I could not find that comment? I *think* it's ok because when the region is in the
>probe state changes will not be made to this list, but it would be useful to at
>least have commentary to that effect. Protect against someone copying this
>code in isolation and not consider the context.
Sure. Will do.
>
>[..]
>> >> +
>> >> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct
>> >> +cxl_region *cxlr)
>> >
>> >Please separate this into a memdev helper and a region helper. It is
>> >silly to have two arguments to a function where one is expected to be
>> >NULL at all times, and then have an if else statement inside that to
>> >effectively turn it back into 2 code paths.
>> >
>> >If there is code to be shared amongst those, make *that* the shared helper.
>> I added single function cxl_mem_ras_features_init() for both memdev
>> and region based scrubbing to reduce code size as there were feedbacks try
>reduce code size.
>
>"Succint" and "concise" does not necessarily mean less lines. I would greatly
>prefer a few more lines if it mines not outsourcing complexity to the calling
>context. Readable code means I do not need to wonder
>what:
>
>   cxl_mem_ras_features_init(NULL, cxlr)
>
>...means. I can just read devm_cxl_region_edac_register(cxlr), and know exactly
>what is happening without needing to lose my train of thought to go read what
>semantics cxl_mem_ras_features_init() is implementing.
>
>Note that all the other _init() calls in drivers/cxl/ (outside of module_init
>callbacks), are just purely init work, not object registration. Please keep that
>local style.
Sure. Will add  separate functions for region based edac registration.
>
>> >> +{
>> >> +	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
>> >> +	char cxl_dev_name[CXL_DEV_NAME_LEN];
>> >> +	int num_ras_features = 0;
>> >> +	u8 scrub_inst = 0;
>> >> +	int rc;
>> >> +
>> >> +	rc = cxl_memdev_scrub_init(cxlmd, cxlr,
>> >&ras_features[num_ras_features],
>> >> +				   scrub_inst);
>> >> +	if (rc < 0)
>> >> +		return rc;
>> >> +
>> >> +	scrub_inst++;
>> >> +	num_ras_features++;
>> >> +
>> >> +	if (cxlr)
>> >> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
>> >> +			 "cxl_region%d", cxlr->id);
>> >
>> >Why not pass dev_name(&cxlr->dev) directly?
>> Jonathan already replied.
>
>That was purely with the cxl_mem observation, cxlr can be passed directly.
Will check.
>
>> >
>> >> +	else
>> >> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
>> >> +			 "%s_%s", "cxl", dev_name(&cxlmd->dev));
>> >
>> >Can a "cxl" directory be created so that the raw name can be used?
>
>In fact we already do something similar for CONFIG_HMEM_REPORTING (i.e.
>an "access%d" device to create a nameed directory of attributes) so it is a
>question for Boris if he wants to tolerate a parent "cxl" device to parent all CXL
>objects in EDAC.
>
>> >
>> >> +
>> >> +	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
>> >> +				 num_ras_features, ras_features);
>> >
>> >I'm so confused... a few lines down in this patch we have:
>> >
>> >    rc = cxl_mem_ras_features_init(NULL, cxlr);
>> >
>> >...so how can this call to edac_dev_register() unconditionally
>> >de-reference @cxlmd?
>> Thanks for spotting this. It is a bug, need to fix.
>
>
>[..]
>> >> +EXPORT_SYMBOL_NS_GPL(cxl_mem_ras_features_init, "CXL");
>> >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> >> index b98b1ccffd1c..c2be70cd87f8 100644
>> >> --- a/drivers/cxl/core/region.c
>> >> +++ b/drivers/cxl/core/region.c
>> >> @@ -3449,6 +3449,12 @@ static int cxl_region_probe(struct device *dev)
>> >>  					p->res->start, p->res->end, cxlr,
>> >>  					is_system_ram) > 0)
>> >>  			return 0;
>> >> +
>> >> +		rc = cxl_mem_ras_features_init(NULL, cxlr);
>> >> +		if (rc)
>> >> +			dev_warn(&cxlr->dev, "CXL RAS features init for
>> >region_id=%d failed\n",
>> >> +				 cxlr->id);
>> >
>> >There is more to RAS than EDAC memory scrub so this message is
>> >misleading. It is also unnecessary because the driver continues to
>> >load and the admin, if they care, will notice that the EDAC attributes are
>missing.
>> This message was added for the debugging purpose in CXL driver. I will change
>to  dev_dbg().
>
>...but also stop calling this functionality with the blanket term "RAS".
>It is "EDAC scrub and repair extensions to all the other RAS functionality the CXL
>subsystem handles directly", name it accordingly.
Sure. Will change.

Thanks,
Shiju


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ