lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241014164055.00002019@Huawei.com>
Date: Mon, 14 Oct 2024 16:40:55 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: <shiju.jose@...wei.com>
CC: <linux-edac@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
	<linux-acpi@...r.kernel.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <bp@...en8.de>, <tony.luck@...el.com>,
	<rafael@...nel.org>, <lenb@...nel.org>, <mchehab@...nel.org>,
	<dan.j.williams@...el.com>, <dave@...olabs.net>, <dave.jiang@...el.com>,
	<alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
	<ira.weiny@...el.com>, <david@...hat.com>, <Vilas.Sridharan@....com>,
	<leo.duran@....com>, <Yazen.Ghannam@....com>, <rientjes@...gle.com>,
	<jiaqiyan@...gle.com>, <Jon.Grimm@....com>, <dave.hansen@...ux.intel.com>,
	<naoya.horiguchi@....com>, <james.morse@....com>, <jthoughton@...gle.com>,
	<somasundaram.a@....com>, <erdemaktas@...gle.com>, <pgonda@...gle.com>,
	<duenwen@...gle.com>, <gthelen@...gle.com>, <wschwartz@...erecomputing.com>,
	<dferguson@...erecomputing.com>, <wbs@...amperecomputing.com>,
	<nifan.cxl@...il.com>, <tanxiaofei@...wei.com>, <prime.zeng@...ilicon.com>,
	<roberto.sassu@...wei.com>, <kangkang.shen@...urewei.com>,
	<wanghuiqiang@...wei.com>, <linuxarm@...wei.com>
Subject: Re: [PATCH v13 11/18] cxl/memfeature: Add CXL memory device ECS
 control feature

On Wed, 9 Oct 2024 13:41:12 +0100
<shiju.jose@...wei.com> wrote:

> From: Shiju Jose <shiju.jose@...wei.com>
> 
> CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 ECS (Error Check
> Scrub) control feature.
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts.
I never understood the 'transparency to error counts'.
Maybe from software point of view 
'while reporting error counts to the host'.
Unless anyone else can figure out what that text from the CXL spec
means? (I'm guessing it is cut and paste from the JEDEC spec)

> 
> The ECS control allows the requester to change the log entry type, the ECS
> threshold count provided that the request is within the definition
> specified in DDR5 mode registers, change mode between codeword mode and
> row count mode, and reset the ECS counter.
> 
> Register with EDAC device driver, which gets the ECS attr descriptors
> from the EDAC ECS and expose sysfs ECS control attributes to userspace.
> For example ECS control for the memory media FRU 0 in CXL mem0 device is
> in /sys/bus/edac/devices/cxl_mem0/ecs_fruX/
fru0?
> 
> Signed-off-by: Shiju Jose <shiju.jose@...wei.com>

A few little things in line. In general looks good to me.

> ---
>  drivers/cxl/core/memfeature.c | 467 +++++++++++++++++++++++++++++++++-
>  1 file changed, 461 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
> index 84d6e887a4fa..567406566c77 100644
> --- a/drivers/cxl/core/memfeature.c
> +++ b/drivers/cxl/core/memfeature.c
> @@ -19,7 +19,7 @@
>  #include <cxl.h>
>  #include <linux/edac.h>
>  
> -#define CXL_DEV_NUM_RAS_FEATURES	1
> +#define CXL_DEV_NUM_RAS_FEATURES	2
>  #define CXL_DEV_HOUR_IN_SECS	3600
>  
>  #define CXL_SCRUB_NAME_LEN	128
> @@ -309,6 +309,420 @@ static const struct edac_scrub_ops cxl_ps_scrub_ops = {
>  	.set_cycle_duration = cxl_patrol_scrub_write_scrub_cycle,
>  };
>  
> +/* CXL DDR5 ECS control definitions */
> +static const uuid_t cxl_ecs_uuid =
> +	UUID_INIT(0xe5b13f22, 0x2328, 0x4a14, 0xb8, 0xba, 0xb9, 0x69, 0x1e,     \

Why the \?

> +		  0x89, 0x33, 0x86);
> +


> +
> +#define	CXL_ECS_LOG_ENTRY_TYPE_MASK	GENMASK(1, 0)
> +#define	CXL_ECS_REALTIME_REPORT_CAP_MASK	BIT(0)
> +#define	CXL_ECS_THRESHOLD_COUNT_MASK	GENMASK(2, 0)
> +#define	CXL_ECS_MODE_MASK	BIT(3)

That name is a little generic. Maybe CXL_ECS_COUNT_MODE_MASK ?

> +#define	CXL_ECS_RESET_COUNTER_MASK	BIT(4)
> +
> +static const u16 ecs_supp_threshold[] = { 0, 0, 0, 256, 1024, 4096 };
> +
> +enum {
> +	ECS_LOG_ENTRY_TYPE_DRAM = 0x0,
> +	ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1,
> +};
> +
> +enum {
> +	ECS_THRESHOLD_256 = 3,
> +	ECS_THRESHOLD_1024 = 4,
> +	ECS_THRESHOLD_4096 = 5,
> +};
Perhaps move this above the ecs_supp_threshold array and use
static const ecs_supp_threshold[] = {
	[ECS_THRESHOLD_256] = 256,
	[ECS_THRESHOLD_1024] = 1024,
	[ECS_THRESHOLD_4096] = 4096,
};
which will fill the zeros in for you. You don't care about them anyway
as they are undefined values.

> +
> +enum cxl_ecs_mode {
> +	ECS_MODE_COUNTS_ROWS = 0,
> +	ECS_MODE_COUNTS_CODEWORDS = 1,
> +};

> +
> +static int cxl_mem_ecs_set_attrs(struct device *dev, void *drv_data, int fru_id,
> +				 struct cxl_ecs_params *params, u8 param_type)
> +{
> +	struct cxl_ecs_context *cxl_ecs_ctx = drv_data;
> +	struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
> +	struct cxl_ecs_fru_rd_attrs *fru_rd_attrs;
> +	struct cxl_ecs_fru_wr_attrs *fru_wr_attrs;
> +	size_t rd_data_size, wr_data_size;
> +	u16 num_media_frus, count;
> +	size_t data_size;
> +	int ret;
> +
> +	num_media_frus = cxl_ecs_ctx->num_media_frus;
> +	rd_data_size = cxl_ecs_ctx->get_feat_size;
> +	wr_data_size = cxl_ecs_ctx->set_feat_size;
> +	struct cxl_ecs_rd_attrs *rd_attrs __free(kfree) =
> +				kmalloc(rd_data_size, GFP_KERNEL);
> +	if (!rd_attrs)
> +		return -ENOMEM;
> +
> +	data_size = cxl_get_feature(mds, cxl_ecs_uuid,
> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
> +				    rd_attrs, rd_data_size);
> +	if (!data_size)
> +		return -EIO;

blank line here as the next line isn't part of this allocate / check
errors block.

> +	struct cxl_ecs_wr_attrs *wr_attrs __free(kfree) =
> +					kmalloc(wr_data_size, GFP_KERNEL);
> +	if (!wr_attrs)
> +		return -ENOMEM;
> +
> +	/* Fill writable attributes from the current attributes read
CXL uses 
	/*
	 * Fill writable 
style for multiline comments.
 
> +	 * for all the media FRUs.
> +	 */



> +static int cxl_ecs_get_mode_counts_rows(struct device *dev, void *drv_data,
> +					int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	if (params.mode == ECS_MODE_COUNTS_ROWS)
> +		*val = 1;
> +	else
> +		*val = 0;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_get_mode_counts_codewords(struct device *dev, void *drv_data,
> +					     int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;

This form is pretty common. Maybe worth some macros like
you have in the edac side of things?

> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	if (params.mode == ECS_MODE_COUNTS_CODEWORDS)
> +		*val = 1;
> +	else
> +		*val = 0;
> +
> +	return 0;
> +}


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ