[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <97301a67e111449a8ee1edb2fc59b3df@huawei.com>
Date: Tue, 3 Jun 2025 10:11:44 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Li Ming <ming.li@...omail.com>, Alison Schofield
<alison.schofield@...el.com>
CC: "dave@...olabs.net" <dave@...olabs.net>, Jonathan Cameron
<jonathan.cameron@...wei.com>, "dave.jiang@...el.com" <dave.jiang@...el.com>,
"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>, "ira.weiny@...el.com"
<ira.weiny@...el.com>, "dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
miscalculation
>-----Original Message-----
>From: Li Ming <ming.li@...omail.com>
>Sent: 03 June 2025 00:57
>To: Shiju Jose <shiju.jose@...wei.com>; Alison Schofield
><alison.schofield@...el.com>
>Cc: dave@...olabs.net; Jonathan Cameron <jonathan.cameron@...wei.com>;
>dave.jiang@...el.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>dan.j.williams@...el.com; linux-cxl@...r.kernel.org; linux-
>kernel@...r.kernel.org
>Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>miscalculation
>
>On 6/3/2025 1:25 AM, Shiju Jose wrote:
>>> -----Original Message-----
>>> From: Alison Schofield <alison.schofield@...el.com>
>>> Sent: 02 June 2025 17:48
>>> To: Shiju Jose <shiju.jose@...wei.com>
>>> Cc: Li Ming <ming.li@...omail.com>; dave@...olabs.net; Jonathan
>>> Cameron <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>>> vishal.l.verma@...el.com; ira.weiny@...el.com;
>>> dan.j.williams@...el.com; linux- cxl@...r.kernel.org;
>>> linux-kernel@...r.kernel.org
>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>> region miscalculation
>>>
>>> On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>>>>> -----Original Message-----
>>>>> From: Alison Schofield <alison.schofield@...el.com>
>>>>> Sent: 30 May 2025 19:28
>>>>> To: Li Ming <ming.li@...omail.com>
>>>>> Cc: dave@...olabs.net; Jonathan Cameron
>>>>> <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>>>>> vishal.l.verma@...el.com; ira.weiny@...el.com;
>>>>> dan.j.williams@...el.com; Shiju Jose <shiju.jose@...wei.com>;
>>>>> linux- cxl@...r.kernel.org; linux-kernel@...r.kernel.org
>>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>>> region miscalculation
>>>>>
>>>>> On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>>>>>> When trying to update the scrub_cycle value of a cxl region, which
>>>>>> means updating the scrub_cycle value of each memdev under a cxl
>>>>>> region. cxl driver needs to guarantee the new scrub_cycle value is
>>>>>> greater than the min_scrub_cycle value of a memdev, otherwise the
>>>>>> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>>> 8.2.10.9.11.1).
>>>>>> Current implementation logic of getting the min_scrub_cycle value
>>>>>> of a cxl region is that getting the min_scrub_cycle value of each
>>>>>> memdevs under the cxl region, then using the minimum
>>>>>> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>>>>>> the new scrub_cycle value is greater than this value. If yes,
>>>>>> updating the new scrub_cycle value to each memdevs. The issue is
>>>>>> that the new scrub_cycle value is possibly greater than the
>>>>>> minimum min_scrub_cycle value of all memdevs but less than the
>>>>>> maximum min_scrub_cycle value of all memdevs if memdevs have a
>>>>>> different min_scrub_cycle value. The updating operation will
>>>>>> always fail on these memdevs which have a greater min_scrub_cycle
>>>>>> than the new
>>> scrub_cycle.
>>>>>> The correct implementation logic is to get the maximum value of
>>>>>> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>>>>>> is greater than the value. If yes, the new scrub_cycle value is
>>>>>> fit for the
>>> region.
>>>>>> The change also impacts the result of
>>>>>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>>>>>> minimum min_scrub_cycle value among all memdevs under the region
>>>>>> before the change. The interface will return the maximum
>>>>>> min_scrub_cycle value among all memdevs under the region with the
>>> change.
>>>>>> Signed-off-by: Li Ming <ming.li@...omail.com>
>>>>>> ---
>>>>>> I made this change based on my understanding on the SPEC and
>>>>>> current CXL EDAC code, but I am not sure if it is a bug or it is
>>>>>> designed this
>>> way.
>>>>> The attribute is defined to show (per
>>>>> Documentation/ABI/testing/sysfs-edac-
>>>>> scrub)
>>>>> "Supported minimum scrub cycle duration in seconds by the memory
>>>>> scrubber."
>>>>>
>>>>> Your fix, making the min the max of the mins, looks needed.
>>>>>
>>>>> I took a look at the max attribute. If the min is the max on the
>>>>> mins, then the max should be the max of the maxes. But, not true.
>>>>> We do
>>> this:
>>>>> instead: *max = U8_MAX * 3600; /* Max set by register size */
>>>>>
>>>>> The comment isn't helping me, esp since the sysfs description
>>>>> doesn't explain that we are using a constant max.
>>>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>>>> Readable Attributes does not define a field for "max scrub cycle
>>>> supported". Thus for max scrub cycle, returning max value of
>>>> (U8_MAX) of
>>> patrol scrub cycle field.
>>>
>>> Understand that now, thanks. I'm still wondering if both these
>>> deserve more explanation in
>>> Documentation/ABI/testing/sysfs-edac-scrub
>>> explaining the calculations. Like if the device represents an
>>> aggregate of devices, like a region, the min scrub cycle is the max
>>> of the mins, whereas if the device is a single, it's exactly what the
>>> device returned. And for max, explaining what you replied above.
>> Not sure is it appropriate to add these CXL scrub specific details to the generic
>file
>> Documentation/ABI/testing/sysfs-edac-scrub?
>>
>> CXL region specific details were added under section 1.2. Region based
>> scrubbing of Documentation/edac/scrub.rst. May be better add these
>> details for CXL specific min and max scrub cycle calculation to the
>Documentation/edac/scrub.rst?
>>
>> How do you want to post these suggested doc changes, in a follow-up patch
>now?
>>
>> Thanks,
>> Shiju
>
>I can include the doc changes in next version.
Thanks Ming.
May be like this?
diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
index c43be90deab4..ab6014743da5 100644
--- a/Documentation/ABI/testing/sysfs-edac-scrub
+++ b/Documentation/ABI/testing/sysfs-edac-scrub
@@ -49,6 +49,12 @@ Description:
(RO) Supported minimum scrub cycle duration in seconds
by the memory scrubber.
+ Device-based scrub: returns the minimum scrub cycle
+ supported by the memory device.
+
+ Region-based scrub: returns the max of minimum scrub cycles
+ supported by individual memory devices that back the region.
+
What: /sys/bus/edac/devices/<dev-name>/scrubX/max_cycle_duration
Date: March 2025
KernelVersion: 6.15
@@ -57,6 +63,16 @@ Description:
(RO) Supported maximum scrub cycle duration in seconds
by the memory scrubber.
+ Device-based scrub: returns the maximum scrub cycle supported
+ by the memory device.
+
+ Region-based scrub: returns the min of maximum scrub cycles
+ supported by individual memory devices that back the region.
+
+ If the memory device does not provide maximum scrub cycle
+ information, return the maximum supported value of the scrub
+ cycle field.
+
What: /sys/bus/edac/devices/<dev-name>/scrubX/current_cycle_duration
Date: March 2025
KernelVersion: 6.15
>
>
>Thanks
>
>Ming
>
>
>>> Regardless of this noise I'm making about the Docs.. I think Ming
>>> should go ahead and v1 the fix for the min calc.
>>>
>>> --Alison
>>>
>>>> Thanks,
>>>> Shiju
>>>>>
>>>>>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>>>>>> ---
>>>>>> drivers/cxl/core/edac.c | 8 ++++++--
>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>>>>>> index
>>>>>> 2cbc664e5d62..ad243cfe00e7 100644
>>>>>> --- a/drivers/cxl/core/edac.c
>>>>>> +++ b/drivers/cxl/core/edac.c
>>>>>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>> u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>>>>> {
>>>>>> struct cxl_mailbox *cxl_mbox;
>>>>>> - u8 min_scrub_cycle = U8_MAX;
>>>>>> struct cxl_region_params *p;
>>>>>> struct cxl_memdev *cxlmd;
>>>>>> struct cxl_region *cxlr;
>>>>>> + u8 min_scrub_cycle = 0;
>>>>>> int i, ret;
>>>>>>
>>>>>> if (!cxl_ps_ctx->cxlr) {
>>>>>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>> if (ret)
>>>>>> return ret;
>>>>>>
>>>>>> + /*
>>>>>> + * The min_scrub_cycle of a region is the maximum
>value
>>>>> among
>>>>>> + * the min_scrub_cycle of all the memdevs under the
>region.
>>>>>> + */
>>>>>> if (min_cycle)
>>>>>> - min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>>>>>> + min_scrub_cycle = max(*min_cycle,
>min_scrub_cycle);
>>>>>> }
>>>>>>
>>>>>> if (min_cycle)
>>>>>> --
>>>>>> 2.34.1
>>>>>>
Powered by blists - more mailing lists