lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241014153300.00002942@Huawei.com>
Date: Mon, 14 Oct 2024 15:33:00 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: <shiju.jose@...wei.com>
CC: <linux-edac@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
	<linux-acpi@...r.kernel.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <bp@...en8.de>, <tony.luck@...el.com>,
	<rafael@...nel.org>, <lenb@...nel.org>, <mchehab@...nel.org>,
	<dan.j.williams@...el.com>, <dave@...olabs.net>, <dave.jiang@...el.com>,
	<alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
	<ira.weiny@...el.com>, <david@...hat.com>, <Vilas.Sridharan@....com>,
	<leo.duran@....com>, <Yazen.Ghannam@....com>, <rientjes@...gle.com>,
	<jiaqiyan@...gle.com>, <Jon.Grimm@....com>, <dave.hansen@...ux.intel.com>,
	<naoya.horiguchi@....com>, <james.morse@....com>, <jthoughton@...gle.com>,
	<somasundaram.a@....com>, <erdemaktas@...gle.com>, <pgonda@...gle.com>,
	<duenwen@...gle.com>, <gthelen@...gle.com>, <wschwartz@...erecomputing.com>,
	<dferguson@...erecomputing.com>, <wbs@...amperecomputing.com>,
	<nifan.cxl@...il.com>, <tanxiaofei@...wei.com>, <prime.zeng@...ilicon.com>,
	<roberto.sassu@...wei.com>, <kangkang.shen@...urewei.com>,
	<wanghuiqiang@...wei.com>, <linuxarm@...wei.com>
Subject: Re: [PATCH v13 03/18] EDAC: Add ECS control feature

On Wed, 9 Oct 2024 13:41:04 +0100
<shiju.jose@...wei.com> wrote:

> From: Shiju Jose <shiju.jose@...wei.com>
> 
> Add EDAC ECS (Error Check Scrub) control in order to control a memory
> device's ECS feature.
> 
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts.
> 
> The DDR5 device contains number of memory media FRUs per device. The
> DDR5 ECS feature and thus the ECS control driver supports configuring
> the ECS parameters per FRU.
> 
> The memory devices support ECS feature register with EDAC device
> driver, which retrieves the ECS descriptor from EDAC ECS driver and
> exposes the sysfs ECS control attributes to userspace in
> /sys/bus/edac/devices/<dev-name>/ecs_fruX/.
> 
> The common sysfs ECS control interface abstracts the control of an
> arbitrary ECS functionality to a common set of functions.
> 
> The support for ECS feature is added separately because the DDR5 ECS
> features control attributes are dissimilar from those of the scrub
> feature.
> 
> The sysfs ECS attr nodes would be present only if the client driver
> has implemented the corresponding attr callback function and passed
> in ops to the EDAC RAS feature driver during registration.
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
Hi Shiju

A few minor bits and bobs inline.

> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/mode_counts_codewords
> +Date:		Oct 2024
> +KernelVersion:	6.12

These all need updating to 6.13 given we missed on 6.12.

> +Contact:	linux-edac@...r.kernel.org
> +Description:
> +		(RO) True if current mode is ECS counts codewords with errors.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/reset
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@...r.kernel.org
> +Description:
> +		(WO) ECS reset ECC counter.
> +		0 - normal, ECC counter running actively.

For a write only parameter, maybe just reject anything that isn't 1?

> +		1 - reset ECC counter to the default value.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@...r.kernel.org
> +Description:
> +		(RW) ECS threshold count per GB of memory cells.

In the CXL spec it is Gb of memory cells (bits, not bytes).
I'm assuming this is meant to match that.
Maybe safer to spell it out.


> diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c
> new file mode 100755
> index 000000000000..a2b64d7bf6b6
> --- /dev/null
> +++ b/drivers/edac/ecs.c


> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index 5344e2cf6808..20bdb08c7626 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h

>  
> +/**
> + * struct ecs_ops - ECS device operations (all elements optional)
> + * @get_log_entry_type: read the log entry type value.
> + * @set_log_entry_type: set the log entry type value.
> + * @get_log_entry_type_per_dram: read the log entry type per dram value.
> + * @get_log_entry_type_memory_media: read the log entry type per memory media value.
> + * @get_mode: read the mode value.
> + * @set_mode: set the mode value.
> + * @get_mode_counts_rows: read the mode counts rows value.
> + * @get_mode_counts_codewords: read the mode counts codewords value.
> + * @reset: reset the ECS counter.
> + * @get_threshold: read the threshold value.
> + * @set_threshold: set the threshold value.

Maybe it's worth duplicating the ABI docs statement on units?

> + */
> +struct edac_ecs_ops {
> +	int (*get_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*set_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 val);
> +	int (*get_log_entry_type_per_dram)(struct device *dev, void *drv_data,
> +					   int fru_id, u32 *val);
> +	int (*get_log_entry_type_per_memory_media)(struct device *dev, void *drv_data,
> +						   int fru_id, u32 *val);
> +	int (*get_mode)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*set_mode)(struct device *dev, void *drv_data, int fru_id, u32 val);
> +	int (*get_mode_counts_rows)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*get_mode_counts_codewords)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*reset)(struct device *dev, void *drv_data, int fru_id, u32 val);
> +	int (*get_threshold)(struct device *dev, void *drv_data, int fru_id, u32 *threshold);
> +	int (*set_threshold)(struct device *dev, void *drv_data, int fru_id, u32 threshold);
> +};



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ