[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241022190454.GIZxf3VkmLVR-JLeUc@fat_crate.local>
Date: Tue, 22 Oct 2024 21:04:54 +0200
From: Borislav Petkov <bp@...en8.de>
To: shiju.jose@...wei.com
Cc: linux-edac@...r.kernel.org, linux-cxl@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, tony.luck@...el.com,
rafael@...nel.org, lenb@...nel.org, mchehab@...nel.org,
dan.j.williams@...el.com, dave@...olabs.net,
jonathan.cameron@...wei.com, dave.jiang@...el.com,
alison.schofield@...el.com, vishal.l.verma@...el.com,
ira.weiny@...el.com, david@...hat.com, Vilas.Sridharan@....com,
leo.duran@....com, Yazen.Ghannam@....com, rientjes@...gle.com,
jiaqiyan@...gle.com, Jon.Grimm@....com, dave.hansen@...ux.intel.com,
naoya.horiguchi@....com, james.morse@....com, jthoughton@...gle.com,
somasundaram.a@....com, erdemaktas@...gle.com, pgonda@...gle.com,
duenwen@...gle.com, gthelen@...gle.com,
wschwartz@...erecomputing.com, dferguson@...erecomputing.com,
wbs@...amperecomputing.com, nifan.cxl@...il.com,
tanxiaofei@...wei.com, prime.zeng@...ilicon.com,
roberto.sassu@...wei.com, kangkang.shen@...urewei.com,
wanghuiqiang@...wei.com, linuxarm@...wei.com
Subject: Re: [PATCH v13 02/18] EDAC: Add scrub control feature
On Wed, Oct 09, 2024 at 01:41:03PM +0100, shiju.jose@...wei.com wrote:
> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
> new file mode 100644
> index 000000000000..b4f8f0bba17b
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-edac-scrub
> @@ -0,0 +1,69 @@
> +What: /sys/bus/edac/devices/<dev-name>/scrubX
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + The sysfs EDAC bus devices /<dev-name>/scrubX subdirectory
> + belongs to an instance of memory scrub control feature,
> + where <dev-name> directory corresponds to a device/memory
> + region registered with the EDAC device driver for the
> + scrub control feature.
> + The sysfs scrub attr nodes would be present only if the
> + client driver has implemented the corresponding attr
> + callback function and passed in ops to the EDAC RAS feature
> + driver during registration.
> +
> +What: /sys/bus/edac/devices/<dev-name>/scrubX/addr_range_base
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + (RW) The base of the address range of the memory region
> + to be scrubbed (on-demand scrubbing).
Why does this sound more complicated than it is?
Why isn't this simply "addr" and the next one "size"?
On-demand scrubbing should scrub at address "addr" and "size" bytes. Can't get
any simpler than that.
> +
> +What: /sys/bus/edac/devices/<dev-name>/scrubX/addr_range_size
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + (RW) The size of the address range of the memory region
> + to be scrubbed (on-demand scrubbing).
> +
> +What: /sys/bus/edac/devices/<dev-name>/scrubX/enable_background
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + (RW) Start/Stop background(patrol) scrubbing if supported.
> +
> +What: /sys/bus/edac/devices/<dev-name>/scrubX/enable_on_demand
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + (RW) Start/Stop on-demand scrubbing the memory region
> + if supported.
Why do you need a separate "enable" flag?
Why can't it be: "writing into "addr" starts the on-demand scrubbing"?
> +
> +What: /sys/bus/edac/devices/<dev-name>/scrubX/min_cycle_duration
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + (RO) Supported minimum scrub cycle duration in seconds
> + by the memory scrubber.
> +
> +What: /sys/bus/edac/devices/<dev-name>/scrubX/max_cycle_duration
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + (RO) Supported maximum scrub cycle duration in seconds
> + by the memory scrubber.
> +
> +What: /sys/bus/edac/devices/<dev-name>/scrubX/current_cycle_duration
> +Date: Oct 2024
> +KernelVersion: 6.12
> +Contact: linux-edac@...r.kernel.org
> +Description:
> + (RW) The current scrub cycle duration in seconds and must be
> + within the supported range by the memory scrubber.
What are those three good for and why are they exposed?
> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
> index 4edfb83ffbee..a96a74de8b9e 100644
> --- a/drivers/edac/Makefile
> +++ b/drivers/edac/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_EDAC) := edac_core.o
>
> edac_core-y := edac_mc.o edac_device.o edac_mc_sysfs.o
> edac_core-y += edac_module.o edac_device_sysfs.o wq.o
> +edac_core-y += scrub.o
>
> edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o
>
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 0b8aa8150239..0c9da55d18bc 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -576,6 +576,7 @@ static void edac_dev_release(struct device *dev)
> {
> struct edac_dev_feat_ctx *ctx = container_of(dev, struct edac_dev_feat_ctx, dev);
>
> + kfree(ctx->scrub);
> kfree(ctx->dev.groups);
> kfree(ctx);
> }
> @@ -610,7 +611,9 @@ int edac_dev_register(struct device *parent, char *name,
> const struct edac_dev_feature *ras_features)
> {
> const struct attribute_group **ras_attr_groups;
> + struct edac_dev_data *dev_data;
> struct edac_dev_feat_ctx *ctx;
> + int scrub_cnt = 0, scrub_inst = 0;
> int attr_gcnt = 0;
> int ret, feat;
The EDAC tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::
struct long_struct_name *descriptive_name;
unsigned long foo, bar;
unsigned int tmp;
int ret;
The above is faster to parse than the reverse ordering::
int ret;
unsigned int tmp;
unsigned long foo, bar;
struct long_struct_name *descriptive_name;
And even more so than random ordering::
unsigned long foo, bar;
int ret;
struct long_struct_name *descriptive_name;
unsigned int tmp;
>
> @@ -620,7 +623,10 @@ int edac_dev_register(struct device *parent, char *name,
> /* Double parse to make space for attributes */
> for (feat = 0; feat < num_features; feat++) {
> switch (ras_features[feat].ft_type) {
> - /* Add feature specific code */
> + case RAS_FEAT_SCRUB:
> + attr_gcnt++;
> + scrub_cnt++;
> + break;
> default:
> return -EINVAL;
> }
> @@ -639,13 +645,36 @@ int edac_dev_register(struct device *parent, char *name,
> goto ctx_free;
> }
>
> + if (scrub_cnt) {
> + ctx->scrub = kcalloc(scrub_cnt, sizeof(*ctx->scrub), GFP_KERNEL);
> + if (!ctx->scrub) {
> + ret = -ENOMEM;
> + goto groups_free;
> + }
> + }
> +
> attr_gcnt = 0;
> for (feat = 0; feat < num_features; feat++, ras_features++) {
> switch (ras_features->ft_type) {
> - /* Add feature specific code */
> + case RAS_FEAT_SCRUB:
> + if (!ras_features->scrub_ops)
> + continue;
Continue?
I think fail. What is a scrub feature good for if it doesn't have ops?
> + if (scrub_inst != ras_features->instance)
> + goto data_mem_free;
> + dev_data = &ctx->scrub[scrub_inst];
> + dev_data->instance = scrub_inst;
> + dev_data->scrub_ops = ras_features->scrub_ops;
> + dev_data->private = ras_features->ctx;
> + ret = edac_scrub_get_desc(parent, &ras_attr_groups[attr_gcnt],
> + ras_features->instance);
> + if (ret)
> + goto data_mem_free;
> + scrub_inst++;
> + attr_gcnt++;
> + break;
> default:
> ret = -EINVAL;
> - goto groups_free;
> + goto data_mem_free;
> }
> }
>
...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists