lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d75c574e26d94aa9b398acfca0ecac9d@huawei.com>
Date: Fri, 28 Mar 2025 10:18:20 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Dan Williams <dan.j.williams@...el.com>, "linux-cxl@...r.kernel.org"
	<linux-cxl@...r.kernel.org>, "dave@...olabs.net" <dave@...olabs.net>,
	Jonathan Cameron <jonathan.cameron@...wei.com>, "dave.jiang@...el.com"
	<dave.jiang@...el.com>, "alison.schofield@...el.com"
	<alison.schofield@...el.com>, "vishal.l.verma@...el.com"
	<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
	"david@...hat.com" <david@...hat.com>, "Vilas.Sridharan@....com"
	<Vilas.Sridharan@....com>
CC: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "bp@...en8.de" <bp@...en8.de>,
	"tony.luck@...el.com" <tony.luck@...el.com>, "rafael@...nel.org"
	<rafael@...nel.org>, "lenb@...nel.org" <lenb@...nel.org>,
	"mchehab@...nel.org" <mchehab@...nel.org>, "leo.duran@....com"
	<leo.duran@....com>, "Yazen.Ghannam@....com" <Yazen.Ghannam@....com>,
	"rientjes@...gle.com" <rientjes@...gle.com>, "jiaqiyan@...gle.com"
	<jiaqiyan@...gle.com>, "Jon.Grimm@....com" <Jon.Grimm@....com>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
	"naoya.horiguchi@....com" <naoya.horiguchi@....com>, "james.morse@....com"
	<james.morse@....com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
	"somasundaram.a@....com" <somasundaram.a@....com>, "erdemaktas@...gle.com"
	<erdemaktas@...gle.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
	"duenwen@...gle.com" <duenwen@...gle.com>, "gthelen@...gle.com"
	<gthelen@...gle.com>, "wschwartz@...erecomputing.com"
	<wschwartz@...erecomputing.com>, "dferguson@...erecomputing.com"
	<dferguson@...erecomputing.com>, "wbs@...amperecomputing.com"
	<wbs@...amperecomputing.com>, "nifan.cxl@...il.com" <nifan.cxl@...il.com>,
	tanxiaofei <tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>,
	Roberto Sassu <roberto.sassu@...wei.com>, "kangkang.shen@...urewei.com"
	<kangkang.shen@...urewei.com>, wanghuiqiang <wanghuiqiang@...wei.com>,
	Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH v2 3/8] cxl/edac: Add CXL memory device patrol scrub
 control feature

>-----Original Message-----
>From: Dan Williams <dan.j.williams@...el.com>
>Sent: 27 March 2025 01:47
>To: Shiju Jose <shiju.jose@...wei.com>; linux-cxl@...r.kernel.org;
>dan.j.williams@...el.com; dave@...olabs.net; Jonathan Cameron
><jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>alison.schofield@...el.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>david@...hat.com; Vilas.Sridharan@....com
>Cc: linux-edac@...r.kernel.org; linux-acpi@...r.kernel.org; linux-
>mm@...ck.org; linux-kernel@...r.kernel.org; bp@...en8.de;
>tony.luck@...el.com; rafael@...nel.org; lenb@...nel.org;
>mchehab@...nel.org; leo.duran@....com; Yazen.Ghannam@....com;
>rientjes@...gle.com; jiaqiyan@...gle.com; Jon.Grimm@....com;
>dave.hansen@...ux.intel.com; naoya.horiguchi@....com;
>james.morse@....com; jthoughton@...gle.com; somasundaram.a@....com;
>erdemaktas@...gle.com; pgonda@...gle.com; duenwen@...gle.com;
>gthelen@...gle.com; wschwartz@...erecomputing.com;
>dferguson@...erecomputing.com; wbs@...amperecomputing.com;
>nifan.cxl@...il.com; tanxiaofei <tanxiaofei@...wei.com>; Zengtao (B)
><prime.zeng@...ilicon.com>; Roberto Sassu <roberto.sassu@...wei.com>;
>kangkang.shen@...urewei.com; wanghuiqiang <wanghuiqiang@...wei.com>;
>Linuxarm <linuxarm@...wei.com>; Shiju Jose <shiju.jose@...wei.com>
>Subject: Re: [PATCH v2 3/8] cxl/edac: Add CXL memory device patrol scrub
>control feature
>
>shiju.jose@ wrote:
>> From: Shiju Jose <shiju.jose@...wei.com>
>>
>> CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub
[...]
>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c new
>> file mode 100644 index 000000000000..5ec3535785e1
>> --- /dev/null
>> +++ b/drivers/cxl/core/edac.c
>> @@ -0,0 +1,474 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * CXL EDAC memory feature driver.
>> + *
>> + * Copyright (c) 2024-2025 HiSilicon Limited.
>> + *
>> + *  - Supports functions to configure EDAC features of the
>> + *    CXL memory devices.
>> + *  - Registers with the EDAC device subsystem driver to expose
>> + *    the features sysfs attributes to the user for configuring
>> + *    CXL memory RAS feature.
>> + */
>> +
>> +#include <linux/cleanup.h>
>> +#include <linux/edac.h>
>> +#include <linux/limits.h>
>> +#include <cxl/features.h>
>> +#include <cxl.h>
>> +#include <cxlmem.h>
>> +#include "core.h"
>> +
>> +#define CXL_NR_EDAC_DEV_FEATURES 1
>> +
>> +static struct rw_semaphore *cxl_acquire(struct rw_semaphore *rwsem) {
>> +	if (down_read_interruptible(rwsem))
>> +		return NULL;
>> +
>> +	return rwsem;
>> +}
>> +
>> +DEFINE_FREE(cxl_unlock, struct rw_semaphore *, if (_T) up_read(_T))
>
>I know I suggested cxl_acquire() and cxl_unlock(), but this really is a generic
>facility.
>
>Let's call it rwsem_read_intr_acquire() and rwsem_read_release(), and I'll
>follow up later with Peter to see if he wants this to graduate from CXL.
>
>Also, go ahead and define it in cxl.h for now as I think other places in the
>subsystem could benefit from this approach.

Hi Dan,

Thanks for the comments.
Sure. should these definitions in cxl.h  require in a separate patch?

>
>> +
>> +/*
>> + * CXL memory patrol scrub control
>> + */
>> +struct cxl_patrol_scrub_context {
>
>I like "patrol_scrub" spelled out here compared to "ps" used everywhere else.
Will change.

>
>> +	u8 instance;
>> +	u16 get_feat_size;
>> +	u16 set_feat_size;
>> +	u8 get_version;
>> +	u8 set_version;
>> +	u16 effects;
>> +	struct cxl_memdev *cxlmd;
>> +	struct cxl_region *cxlr;
>> +};
>> +
>> +/**
>> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data
>structure.
>> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
>> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is
>changeable.
>> + * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
>> + *                      [OUT] Current patrol scrub cycle in hours.
>> + * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours
>supported.
>> + */
>> +struct cxl_memdev_ps_params {
>> +	bool enable;
>> +	bool scrub_cycle_changeable;
>
>This is set but unused. Even if it were to be used I would expect it to be set in the
>cxl_patrol_scrub_context.
I will add  to cxl_patrol_scrub_context and will add an extra check against this when
user request to change the scrub rate.
>
>> +	u8 scrub_cycle_hrs;
>> +	u8 min_scrub_cycle_hrs;
>> +};
>
>I do not understand the point of this extra object and would prefer to keep
>intermediate data structures to a minimum.
>
>It looks like all this does is provide for short lived parsed caching of the raw
>hardware patrol scrube attributes. Just pass those raw objects around and
>provide helpers to do the conversion.

Sure. Will do. Was added to avoid more number of function parameters. 
>
>The less data structures the less confusion for the next person that has to read
>this code a few years down the road.
>
>> +
>> +enum cxl_scrub_param {
>> +	CXL_PS_PARAM_ENABLE,
>> +	CXL_PS_PARAM_SCRUB_CYCLE,
>> +};
>
>This seems unforuntate, why not make non-zero scrub rate an implied enable
>and zero to disable? A non-zero sentinel value like U32_MAX can indicate "keep
>scrub rate unchanged".

These  enums can be removed with remove using cxl_memdev_ps_params.

>
>> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK BIT(0)
>
>This CXL_MEMDEV_PS prefix is awkward due to overload with 'struct
>cxl_memdev'. Minimize it to something like:
>
>CXL_SCRUB_CONTROL_CHANGEABLE
>CXL_SCRUB_CONTROL_REALTIME
>CXL_SCRUB_CONTROL_CYCLE_MASK
>CXL_SCRUB_CONTROL_MIN_CYCLE_MASK

Will change.
>
>> +#define CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK
>BIT(1)
>> +#define CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK GENMASK(7, 0)
>#define
>> +CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK GENMASK(15, 8) #define
>> +CXL_MEMDEV_PS_FLAG_ENABLED_MASK BIT(0)
>
>CXL_SCRUB_CONTROL_ENABLE
>
>...no need to call it a mask when it is just a single-bit, and when it is both the
>status and the control just call it "enable".

Sure. Will change.
>
>> +
>> +/*
>> + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol
>> +Scrub Control
>> + * Feature Readable Attributes.
>> + */
>> +struct cxl_memdev_ps_rd_attrs {
>> +	u8 scrub_cycle_cap;
>> +	__le16 scrub_cycle_hrs;
>
>"hours" is just 2 more characters than "hrs", I think we can afford the extra
>bytes.

Will change.
>
>[..]
>> +int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd) {
>> +	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
>> +	int num_ras_features = 0;
>> +	u8 scrub_inst = 0;
>> +	int rc;
>> +
>> +	rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features],
>> +				   scrub_inst);
>> +	if (rc < 0 && rc != -EOPNOTSUPP)
>> +		return rc;
>> +
>> +	if (rc != -EOPNOTSUPP)
>> +		num_ras_features++;
>> +
>> +	char *cxl_dev_name __free(kfree) =
>> +		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev));
>
>if (!cxl_dev_name)
>	return -ENOMEM;

Will add.
>
>> +
>> +	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
>> +				 num_ras_features, ras_features); }
>> +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL");
>> +
>> +int devm_cxl_region_edac_register(struct cxl_region *cxlr) {
>> +	struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES];
>> +	int num_ras_features = 0;
>> +	u8 scrub_inst = 0;
>> +	int rc;
>> +
>> +	rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features],
>> +				   scrub_inst);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	num_ras_features++;
>> +
>> +	char *cxl_dev_name __free(kfree) =
>> +		kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev));
>> +
>> +	return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL,
>> +				 num_ras_features, ras_features); }
>> +EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL");
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index b3260d433ec7..2aa6eb675fdf 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -3542,6 +3542,11 @@ static int cxl_region_probe(struct device *dev)
>>  	case CXL_PARTMODE_PMEM:
>>  		return devm_cxl_add_pmem_region(cxlr);
>>  	case CXL_PARTMODE_RAM:
>> +		rc = devm_cxl_region_edac_register(cxlr);
>
>Why do only volatile regions get EDAC support? PMEM patrol scrub seems
>equally valid.

Will add devm_cxl_region_edac_register () for PMEM as well.

Thanks,
Shiju

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ