[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <08f67030bb81401688ef8a5505f10c7e@huawei.com>
Date: Tue, 14 Jan 2025 10:08:42 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
CC: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "bp@...en8.de" <bp@...en8.de>,
"tony.luck@...el.com" <tony.luck@...el.com>, "rafael@...nel.org"
<rafael@...nel.org>, "lenb@...nel.org" <lenb@...nel.org>,
"mchehab@...nel.org" <mchehab@...nel.org>, "dan.j.williams@...el.com"
<dan.j.williams@...el.com>, "dave@...olabs.net" <dave@...olabs.net>,
"Jonathan Cameron" <jonathan.cameron@...wei.com>, "dave.jiang@...el.com"
<dave.jiang@...el.com>, "alison.schofield@...el.com"
<alison.schofield@...el.com>, "vishal.l.verma@...el.com"
<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
"david@...hat.com" <david@...hat.com>, "Vilas.Sridharan@....com"
<Vilas.Sridharan@....com>, "leo.duran@....com" <leo.duran@....com>,
"Yazen.Ghannam@....com" <Yazen.Ghannam@....com>, "rientjes@...gle.com"
<rientjes@...gle.com>, "jiaqiyan@...gle.com" <jiaqiyan@...gle.com>,
"Jon.Grimm@....com" <Jon.Grimm@....com>, "dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>, "naoya.horiguchi@....com"
<naoya.horiguchi@....com>, "james.morse@....com" <james.morse@....com>,
"jthoughton@...gle.com" <jthoughton@...gle.com>, "somasundaram.a@....com"
<somasundaram.a@....com>, "erdemaktas@...gle.com" <erdemaktas@...gle.com>,
"pgonda@...gle.com" <pgonda@...gle.com>, "duenwen@...gle.com"
<duenwen@...gle.com>, "gthelen@...gle.com" <gthelen@...gle.com>,
"wschwartz@...erecomputing.com" <wschwartz@...erecomputing.com>,
"dferguson@...erecomputing.com" <dferguson@...erecomputing.com>,
"wbs@...amperecomputing.com" <wbs@...amperecomputing.com>,
"nifan.cxl@...il.com" <nifan.cxl@...il.com>, tanxiaofei
<tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>, "Roberto
Sassu" <roberto.sassu@...wei.com>, "kangkang.shen@...urewei.com"
<kangkang.shen@...urewei.com>, wanghuiqiang <wanghuiqiang@...wei.com>,
Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH v18 01/19] EDAC: Add support for EDAC device features
control
>-----Original Message-----
>From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
>Sent: 13 January 2025 15:06
>To: Shiju Jose <shiju.jose@...wei.com>
>Cc: linux-edac@...r.kernel.org; linux-cxl@...r.kernel.org; linux-
>acpi@...r.kernel.org; linux-mm@...ck.org; linux-kernel@...r.kernel.org;
>bp@...en8.de; tony.luck@...el.com; rafael@...nel.org; lenb@...nel.org;
>mchehab@...nel.org; dan.j.williams@...el.com; dave@...olabs.net; Jonathan
>Cameron <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>alison.schofield@...el.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>david@...hat.com; Vilas.Sridharan@....com; leo.duran@....com;
>Yazen.Ghannam@....com; rientjes@...gle.com; jiaqiyan@...gle.com;
>Jon.Grimm@....com; dave.hansen@...ux.intel.com;
>naoya.horiguchi@....com; james.morse@....com; jthoughton@...gle.com;
>somasundaram.a@....com; erdemaktas@...gle.com; pgonda@...gle.com;
>duenwen@...gle.com; gthelen@...gle.com;
>wschwartz@...erecomputing.com; dferguson@...erecomputing.com;
>wbs@...amperecomputing.com; nifan.cxl@...il.com; tanxiaofei
><tanxiaofei@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Roberto
>Sassu <roberto.sassu@...wei.com>; kangkang.shen@...urewei.com;
>wanghuiqiang <wanghuiqiang@...wei.com>; Linuxarm
><linuxarm@...wei.com>
>Subject: Re: [PATCH v18 01/19] EDAC: Add support for EDAC device features
>control
>
>Em Mon, 6 Jan 2025 12:09:57 +0000
><shiju.jose@...wei.com> escreveu:
>
>> From: Shiju Jose <shiju.jose@...wei.com>
>>
>> Add generic EDAC device feature controls supporting the registration
>> of RAS features available in the system. The driver exposes control
>> attributes for these features to userspace in
>> /sys/bus/edac/devices/<dev-name>/<ras-feature>/
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
>> ---
>> Documentation/edac/features.rst | 94 ++++++++++++++++++++++++++++++
>> Documentation/edac/index.rst | 10 ++++
>> drivers/edac/edac_device.c | 100 ++++++++++++++++++++++++++++++++
>> include/linux/edac.h | 28 +++++++++
>> 4 files changed, 232 insertions(+)
>> create mode 100644 Documentation/edac/features.rst create mode
>> 100644 Documentation/edac/index.rst
>>
>> diff --git a/Documentation/edac/features.rst
>> b/Documentation/edac/features.rst new file mode 100644 index
>> 000000000000..f32f259ce04d
>> --- /dev/null
>> +++ b/Documentation/edac/features.rst
>> @@ -0,0 +1,94 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>
>SPDX should match what's written there, e. g.
>
> .. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-or-later
>
>Please notice that GNU FDL family contains both open source and non-open
>source licenses. The open-source one is this:
>
> https://spdx.org/licenses/GFDL-1.2-no-invariants-or-later.html
>
>E.g. it is a the license permits changing the entire document in the future, as
>there's no invariant parts on it.
This seems not widely used, have seen this is used in few documents only.
>
>> +
>> +============================================
>> +Augmenting EDAC for controlling RAS features
>> +============================================
>> +
>> +Copyright (c) 2024 HiSilicon Limited.
>
>2024-2025?
Will do.
>
>> +
>> +:Author: Shiju Jose <shiju.jose@...wei.com>
>> +:License: The GNU Free Documentation License, Version 1.2
>> + (dual licensed under the GPL v2)
>
>You need to define if invariant parts are allowed or not, e. g.:
>
> :License: The GNU Free Documentation License, Version 1.2 without
>Invariant Sections, Front-Cover Texts nor Back-Cover Texts.
> (dual licensed under the GPL v2)
Same as above.
>
>
>> +:Original Reviewers:
>> +
>> +- Written for: 6.14
>> +
>> +Introduction
>> +------------
>> +The expansion of EDAC for controlling RAS features and exposing
>> +features control attributes to userspace via sysfs. Some Examples:
>> +
>> +* Scrub control
>> +
>> +* Error Check Scrub (ECS) control
>> +
>> +* ACPI RAS2 features
>> +
>> +* Post Package Repair (PPR) control
>> +
>> +* Memory Sparing Repair control etc.
>> +
>> +High level design is illustrated in the following diagram::
>> +
>> + _______________________________________________
>> + | Userspace - Rasdaemon |
>> + | _____________ |
>> + | | RAS CXL mem | _______________ |
>> + | |error handler|---->| | |
>> + | |_____________| | RAS dynamic | |
>> + | _____________ | scrub, memory | |
>> + | | RAS memory |---->| repair control| |
>> + | |error handler| |_______________| |
>> + | |_____________| | |
>> + |__________________________|____________________|
>> + |
>> + |
>> +
>_______________________________|______________________________
>> + | Kernel EDAC extension for | controlling RAS Features |
>> + | ______________________________|____________________________
>|
>> + || EDAC Core Sysfs EDAC| Bus | |
>> + || __________________________|_________ _____________ | |
>> + || |/sys/bus/edac/devices/<dev>/scrubX/ | | EDAC device || |
>> + || |/sys/bus/edac/devices/<dev>/ecsX/ |<->| EDAC MC || |
>> + || |/sys/bus/edac/devices/<dev>/repairX | | EDAC sysfs || |
>> + || |____________________________________| |_____________|| |
>> + || EDAC|Bus | |
>> + || | | |
>> + || __________ Get feature | Get feature | |
>> + || | |desc _________|______ desc __________ | |
>> + || |EDAC scrub|<-----| EDAC device | | | | |
>> + || |__________| | driver- RAS |---->| EDAC mem | | |
>> + || __________ | feature control| | repair | | |
>> + || | |<-----|________________| |__________| | |
>> + || |EDAC ECS | Register RAS|features | |
>> + || |__________| | | |
>> + || ______________________|_____________ | |
>> + ||_________|_______________|__________________|______________|
>|
>> + | _______|____ _______|_______ ____|__________ |
>> + | | | | CXL mem driver| | Client driver | |
>> + | | ACPI RAS2 | | scrub, ECS, | | memory repair | |
>> + | | driver | | sparing, PPR | | features | |
>> + | |____________| |_______________| |_______________| |
>> + | | | | |
>> +
>|________|_________________|____________________|______________|
>> + | | |
>> +
>________|_________________|____________________|______________
>> + | ___|_________________|____________________|_______ |
>> + | | | |
>> + | | Platform HW and Firmware | |
>> + | |__________________________________________________| |
>> +
>|______________________________________________________________|
>> +
>> +
>> +1. EDAC Features components - Create feature specific descriptors.
>> +For example, EDAC scrub, EDAC ECS, EDAC memory repair in the above
>> +diagram.
>> +
>> +2. EDAC device driver for controlling RAS Features - Get feature's
>> +attribute descriptors from EDAC RAS feature component and registers
>> +device's RAS features with EDAC bus and exposes the features control
>> +attributes via the sysfs EDAC bus. For example,
>> +/sys/bus/edac/devices/<dev-name>/<feature>X/
>> +
>> +3. RAS dynamic feature controller - Userspace sample modules in
>> +rasdaemon for dynamic scrub/repair control to issue scrubbing/repair
>> +when excess number of corrected memory errors are reported in a short
>span of time.
>> diff --git a/Documentation/edac/index.rst
>> b/Documentation/edac/index.rst new file mode 100644 index
>> 000000000000..b6c265a4cffb
>> --- /dev/null
>> +++ b/Documentation/edac/index.rst
>> @@ -0,0 +1,10 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============
>> +EDAC Subsystem
>> +==============
>> +
>> +.. toctree::
>> + :maxdepth: 1
>> +
>> + features
>> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
>> index 621dc2a5d034..9fce46dd7405 100644
>> --- a/drivers/edac/edac_device.c
>> +++ b/drivers/edac/edac_device.c
>> @@ -570,3 +570,103 @@ void edac_device_handle_ue_count(struct
>edac_device_ctl_info *edac_dev,
>> block ? block->name : "N/A", count, msg); }
>> EXPORT_SYMBOL_GPL(edac_device_handle_ue_count);
>> +
>> +static void edac_dev_release(struct device *dev) {
>> + struct edac_dev_feat_ctx *ctx = container_of(dev, struct
>> +edac_dev_feat_ctx, dev);
>> +
>> + kfree(ctx->dev.groups);
>> + kfree(ctx);
>> +}
>> +
>> +const struct device_type edac_dev_type = {
>> + .name = "edac_dev",
>> + .release = edac_dev_release,
>> +};
>> +
>> +static void edac_dev_unreg(void *data) {
>> + device_unregister(data);
>> +}
>> +
>> +/**
>> + * edac_dev_register - register device for RAS features with EDAC
>> + * @parent: parent device.
>> + * @name: parent device's name.
>> + * @private: parent driver's data to store in the context if any.
>> + * @num_features: number of RAS features to register.
>> + * @ras_features: list of RAS features to register.
>> + *
>> + * Return:
>> + * * %0 - Success.
>> + * * %-EINVAL - Invalid parameters passed.
>> + * * %-ENOMEM - Dynamic memory allocation failed.
>> + *
>> + */
>> +int edac_dev_register(struct device *parent, char *name,
>> + void *private, int num_features,
>> + const struct edac_dev_feature *ras_features) {
>> + const struct attribute_group **ras_attr_groups;
>> + struct edac_dev_feat_ctx *ctx;
>> + int attr_gcnt = 0;
>> + int ret, feat;
>> +
>> + if (!parent || !name || !num_features || !ras_features)
>> + return -EINVAL;
>> +
>> + /* Double parse to make space for attributes */
>> + for (feat = 0; feat < num_features; feat++) {
>> + switch (ras_features[feat].ft_type) {
>> + /* Add feature specific code */
>> + default:
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> + if (!ctx)
>> + return -ENOMEM;
>> +
>> + ras_attr_groups = kcalloc(attr_gcnt + 1, sizeof(*ras_attr_groups),
>GFP_KERNEL);
>> + if (!ras_attr_groups) {
>> + ret = -ENOMEM;
>> + goto ctx_free;
>> + }
>> +
>> + attr_gcnt = 0;
>> + for (feat = 0; feat < num_features; feat++, ras_features++) {
>> + switch (ras_features->ft_type) {
>> + /* Add feature specific code */
>> + default:
>> + ret = -EINVAL;
>> + goto groups_free;
>> + }
>> + }
>> +
>> + ctx->dev.parent = parent;
>> + ctx->dev.bus = edac_get_sysfs_subsys();
>> + ctx->dev.type = &edac_dev_type;
>> + ctx->dev.groups = ras_attr_groups;
>> + ctx->private = private;
>> + dev_set_drvdata(&ctx->dev, ctx);
>> +
>> + ret = dev_set_name(&ctx->dev, name);
>> + if (ret)
>> + goto groups_free;
>> +
>> + ret = device_register(&ctx->dev);
>> + if (ret) {
>> + put_device(&ctx->dev);
>
>> + return ret;
>
>As register failed, you need to change it to a goto groups_free, as
>edac_dev_release() won't be called.
As per experimentation edac_dev_release() will be called.
>
>> + }
>> +
>> + return devm_add_action_or_reset(parent, edac_dev_unreg, &ctx->dev);
>> +
>> +groups_free:
>> + kfree(ras_attr_groups);
>> +ctx_free:
>> + kfree(ctx);
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(edac_dev_register);
>> diff --git a/include/linux/edac.h b/include/linux/edac.h index
>> b4ee8961e623..521b17113d4d 100644
>> --- a/include/linux/edac.h
>> +++ b/include/linux/edac.h
>> @@ -661,4 +661,32 @@ static inline struct dimm_info
>> *edac_get_dimm(struct mem_ctl_info *mci,
>>
>> return mci->dimms[index];
>> }
>> +
>> +#define EDAC_FEAT_NAME_LEN 128
>
>This macro was not used on this patch.
Sure.
>
>> +
>> +/* RAS feature type */
>> +enum edac_dev_feat {
>> + RAS_FEAT_MAX
>> +};
>> +
>> +/* EDAC device feature information structure */ struct edac_dev_data
>> +{
>> + u8 instance;
>> + void *private;
>> +};
>> +
>> +struct edac_dev_feat_ctx {
>> + struct device dev;
>> + void *private;
>> +};
>> +
>> +struct edac_dev_feature {
>> + enum edac_dev_feat ft_type;
>> + u8 instance;
>> + void *ctx;
>> +};
>> +
>> +int edac_dev_register(struct device *parent, char *dev_name,
>> + void *parent_pvt_data, int num_features,
>> + const struct edac_dev_feature *ras_features);
>> #endif /* _LINUX_EDAC_H_ */
>
>Thanks,
>Mauro
Thanks,
Shiju
Powered by blists - more mailing lists