[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2263e5a551e24a6ca63dbe33538a5ce8@huawei.com>
Date: Mon, 28 Oct 2024 16:03:31 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Borislav Petkov <bp@...en8.de>
CC: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "tony.luck@...el.com" <tony.luck@...el.com>,
"rafael@...nel.org" <rafael@...nel.org>, "lenb@...nel.org" <lenb@...nel.org>,
"mchehab@...nel.org" <mchehab@...nel.org>, "dan.j.williams@...el.com"
<dan.j.williams@...el.com>, "dave@...olabs.net" <dave@...olabs.net>,
"Jonathan Cameron" <jonathan.cameron@...wei.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"sudeep.holla@....com" <sudeep.holla@....com>, "jassisinghbrar@...il.com"
<jassisinghbrar@...il.com>, "dave.jiang@...el.com" <dave.jiang@...el.com>,
"alison.schofield@...el.com" <alison.schofield@...el.com>,
"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>, "ira.weiny@...el.com"
<ira.weiny@...el.com>, "david@...hat.com" <david@...hat.com>,
"Vilas.Sridharan@....com" <Vilas.Sridharan@....com>, "leo.duran@....com"
<leo.duran@....com>, "Yazen.Ghannam@....com" <Yazen.Ghannam@....com>,
"rientjes@...gle.com" <rientjes@...gle.com>, "jiaqiyan@...gle.com"
<jiaqiyan@...gle.com>, "Jon.Grimm@....com" <Jon.Grimm@....com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"naoya.horiguchi@....com" <naoya.horiguchi@....com>, "james.morse@....com"
<james.morse@....com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
"somasundaram.a@....com" <somasundaram.a@....com>, "erdemaktas@...gle.com"
<erdemaktas@...gle.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
"duenwen@...gle.com" <duenwen@...gle.com>, "gthelen@...gle.com"
<gthelen@...gle.com>, "wschwartz@...erecomputing.com"
<wschwartz@...erecomputing.com>, "dferguson@...erecomputing.com"
<dferguson@...erecomputing.com>, "wbs@...amperecomputing.com"
<wbs@...amperecomputing.com>, "nifan.cxl@...il.com" <nifan.cxl@...il.com>,
tanxiaofei <tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>,
"Roberto Sassu" <roberto.sassu@...wei.com>, "kangkang.shen@...urewei.com"
<kangkang.shen@...urewei.com>, wanghuiqiang <wanghuiqiang@...wei.com>,
Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH v14 03/14] EDAC: Add ECS control feature
Thanks for the comments.
>-----Original Message-----
>From: Borislav Petkov <bp@...en8.de>
>Sent: 28 October 2024 11:17
>To: Shiju Jose <shiju.jose@...wei.com>
>Cc: linux-edac@...r.kernel.org; linux-cxl@...r.kernel.org; linux-
>acpi@...r.kernel.org; linux-mm@...ck.org; linux-kernel@...r.kernel.org;
>tony.luck@...el.com; rafael@...nel.org; lenb@...nel.org;
>mchehab@...nel.org; dan.j.williams@...el.com; dave@...olabs.net; Jonathan
>Cameron <jonathan.cameron@...wei.com>; gregkh@...uxfoundation.org;
>sudeep.holla@....com; jassisinghbrar@...il.com; dave.jiang@...el.com;
>alison.schofield@...el.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>david@...hat.com; Vilas.Sridharan@....com; leo.duran@....com;
>Yazen.Ghannam@....com; rientjes@...gle.com; jiaqiyan@...gle.com;
>Jon.Grimm@....com; dave.hansen@...ux.intel.com;
>naoya.horiguchi@....com; james.morse@....com; jthoughton@...gle.com;
>somasundaram.a@....com; erdemaktas@...gle.com; pgonda@...gle.com;
>duenwen@...gle.com; gthelen@...gle.com;
>wschwartz@...erecomputing.com; dferguson@...erecomputing.com;
>wbs@...amperecomputing.com; nifan.cxl@...il.com; tanxiaofei
><tanxiaofei@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Roberto
>Sassu <roberto.sassu@...wei.com>; kangkang.shen@...urewei.com;
>wanghuiqiang <wanghuiqiang@...wei.com>; Linuxarm
><linuxarm@...wei.com>
>Subject: Re: [PATCH v14 03/14] EDAC: Add ECS control feature
>
>On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@...wei.com wrote:
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (RW) The log entry type of how the DDR5 ECS log is reported.
>> + 00b - per DRAM.
>> + 01b - per memory media FRU.
>
>If the conversion function here is kstrtoul(), why are those values not "0"
>and "1" but in binary format?
Followed the description for the " Log Entry Type " in the CXL spec rev 3.1 Table 8-210.
DDR5 ECS Control Feature Readable Attributes and Table 8-211. DDR5 ECS Control Feature
Writable Attributes. This was written as " Common DDR5 ECS Log Capabilities " though actually returns
the status of the log_entry_type.
Common DDR5 ECS Log Capabilities
• Bits[1:0]: Log Entry Type: The log entry type of how the ECS log is
reported. The entry type is defined commonly for all memory media FRUs
within the device.
— 00b = Per DRAM
— 01b = Per Memory Media FRU
— All other encodings are reserved
• Bits[7:2]: Reserved.
I will update to 0 and 1.
>
>> +
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type_per_dram
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (RO) True if current log entry type is per DRAM.
>> +
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/log_entry_type_per_memory_media
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (RO) True if current log entry type is per memory media FRU.
>
>What's the point of those two if log_entry_type already gives you the same info?
>
This was written as " Common DDR5 ECS Log Capabilities " in the CXL spec rev 3.1
Table 8-210. DDR5 ECS Control Feature Readable Attributes, though actually returns
the status of the log_entry_type. I will remove these extra log type attributes.
>And the filename length is a bit too much...
>
>> +
>> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/mode
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (RW) The mode of how the DDR5 ECS counts the errors.
>> + 0 - ECS counts rows with errors.
>> + 1 - ECS counts codewords with errors.
>
>Now we have "0" and "1"s. Oh well.
>
>What are "rows", what are "codewords"? Explain them here pls for the user.
Will do.
>
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/mode_counts_rows
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (RO) True if current mode is ECS counts rows with errors.
>> +
>> +What: /sys/bus/edac/devices/<dev-
>name>/ecs_fruX/mode_counts_codewords
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (RO) True if current mode is ECS counts codewords with errors.
>
>Same question as above - redundant files.
I will remove redundant files.
>
>> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/reset
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (WO) ECS reset ECC counter.
>> + 1 - reset ECC counter to the default value.
>
>1 or any value?
>
>Looks like any to me...
>
>You should restrict it to "1" in case you want to extend this interface with "2" in
>the future, for example, doing something a bit different.
I will update. Returns error for any other value set than '1'.
>
>> +
>> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold
>> +Date: Jan 2025
>> +KernelVersion: 6.13
>> +Contact: linux-edac@...r.kernel.org
>> +Description:
>> + (RW) ECS threshold count per gigabits of memory cells.
>
>That definitely needs more explanation.
Sure.
>
>> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index
>> 188501e676c7..b24c2c112d9c 100644
>> --- a/drivers/edac/Makefile
>> +++ b/drivers/edac/Makefile
>> @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC) := edac_core.o
>>
>> edac_core-y := edac_mc.o edac_device.o edac_mc_sysfs.o
>> edac_core-y += edac_module.o edac_device_sysfs.o wq.o
>> -edac_core-y += scrub.o
>> +edac_core-y += scrub.o ecs.o
>>
>> edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o
>>
>> diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c new file mode
>> 100755 index 000000000000..a2b64d7bf6b6
>> --- /dev/null
>> +++ b/drivers/edac/ecs.c
>> @@ -0,0 +1,240 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Generic ECS driver in order to support control the on die
>> + * error check scrub (e.g. DDR5 ECS).
>
>This sentence needs grammar check.
Will do.
>
>> The common sysfs ECS
>> + * interface abstracts the control of an arbitrary ECS
>> + * functionality to a common set of functions.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + */
>> +
>
>#undef pr_fmt
>
>> +#define pr_fmt(fmt) "EDAC ECS: " fmt
>
>Grep the tree for examples how to do that properly.
>
>Also, this pr_fmt looks unused.
I will remove.
>
>> +static umode_t ecs_attr_visible(struct kobject *kobj, struct
>> +attribute *a, int attr_id) {
>> + struct device *ras_feat_dev = kobj_to_dev(kobj);
>> + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
>> +
>> + switch (attr_id) {
>> + case ECS_LOG_ENTRY_TYPE:
>> + if (ops->get_log_entry_type) {
>> + if (ops->set_log_entry_type)
>> + return a->mode;
>> + else
>> + return 0444;
>
>What is the goal for the access mode of all those sysfs entries? I sure hope it is
>going to be root-only no-matter what. I don't want normal users to cause scrub
>activity. Please make sure your whole set does that.
This is the attribute_group's is_visible() callback for controlling the visibility
of the attributes based on whether attribute is added or not by the parent driver.
Yes. Writable only by the root.
>
>> + }
>> + break;
>> + case ECS_LOG_ENTRY_TYPE_PER_DRAM:
>> + if (ops->get_log_entry_type_per_dram)
>> + return a->mode;
>> + break;
[...]
> EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru);
>> + fru_ctx-
>>ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] =
>> +
> EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru);
>> + fru_ctx->ecs_dev_attr[ECS_MODE] =
>EDAC_ECS_ATTR_RW(mode, fru);
>> + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] =
>> +
> EDAC_ECS_ATTR_RO(mode_counts_rows, fru);
>> + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] =
>> +
> EDAC_ECS_ATTR_RO(mode_counts_codewords, fru);
>> + fru_ctx->ecs_dev_attr[ECS_RESET] =
>EDAC_ECS_ATTR_WO(reset, fru);
>> + fru_ctx->ecs_dev_attr[ECS_THRESHOLD] =
>EDAC_ECS_ATTR_RW(threshold,
>> +fru);
>
>Clearly too long variable and define names. Shorten pls.
Will do
>
>Also, a new line here:
Ok.
>
><---
>
>
[...]
>> + case RAS_FEAT_ECS:
>> + if (!ras_features->ecs_ops)
>> + goto data_mem_free;
>
><---- newline here.
Wil do.
>
>> + dev_data = &ctx->ecs;
>> + dev_data->ecs_ops = ras_features->ecs_ops;
>> + dev_data->private = ras_features->ctx;
>> + ret = edac_ecs_get_desc(parent,
>&ras_attr_groups[attr_gcnt],
>> + ras_features-
>>ecs_info.num_media_frus);
>> + if (ret)
>> + goto data_mem_free;
>
>Ditto.
Wil do.
>
>> + attr_gcnt += ras_features->ecs_info.num_media_frus;
>> + break;
>> default:
>> ret = -EINVAL;
>> goto data_mem_free;
>
>--
>Regards/Gruss,
> Boris.
Thanks,
Shiju
Powered by blists - more mailing lists