[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8a9459c5-a4d7-4e8a-8e0c-505b19a74dd3@zohomail.com>
Date: Tue, 3 Jun 2025 18:13:35 +0800
From: Li Ming <ming.li@...omail.com>
To: Shiju Jose <shiju.jose@...wei.com>,
Alison Schofield <alison.schofield@...el.com>
Cc: "dave@...olabs.net" <dave@...olabs.net>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
"dave.jiang@...el.com" <dave.jiang@...el.com>,
"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
"ira.weiny@...el.com" <ira.weiny@...el.com>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
miscalculation
On 6/3/2025 6:11 PM, Shiju Jose wrote:
>> -----Original Message-----
>> From: Li Ming <ming.li@...omail.com>
>> Sent: 03 June 2025 00:57
>> To: Shiju Jose <shiju.jose@...wei.com>; Alison Schofield
>> <alison.schofield@...el.com>
>> Cc: dave@...olabs.net; Jonathan Cameron <jonathan.cameron@...wei.com>;
>> dave.jiang@...el.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>> dan.j.williams@...el.com; linux-cxl@...r.kernel.org; linux-
>> kernel@...r.kernel.org
>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>> miscalculation
>>
>> On 6/3/2025 1:25 AM, Shiju Jose wrote:
>>>> -----Original Message-----
>>>> From: Alison Schofield <alison.schofield@...el.com>
>>>> Sent: 02 June 2025 17:48
>>>> To: Shiju Jose <shiju.jose@...wei.com>
>>>> Cc: Li Ming <ming.li@...omail.com>; dave@...olabs.net; Jonathan
>>>> Cameron <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>>>> vishal.l.verma@...el.com; ira.weiny@...el.com;
>>>> dan.j.williams@...el.com; linux- cxl@...r.kernel.org;
>>>> linux-kernel@...r.kernel.org
>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>> region miscalculation
>>>>
>>>> On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>>>>>> -----Original Message-----
>>>>>> From: Alison Schofield <alison.schofield@...el.com>
>>>>>> Sent: 30 May 2025 19:28
>>>>>> To: Li Ming <ming.li@...omail.com>
>>>>>> Cc: dave@...olabs.net; Jonathan Cameron
>>>>>> <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>>>>>> vishal.l.verma@...el.com; ira.weiny@...el.com;
>>>>>> dan.j.williams@...el.com; Shiju Jose <shiju.jose@...wei.com>;
>>>>>> linux- cxl@...r.kernel.org; linux-kernel@...r.kernel.org
>>>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>>>> region miscalculation
>>>>>>
>>>>>> On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>>>>>>> When trying to update the scrub_cycle value of a cxl region, which
>>>>>>> means updating the scrub_cycle value of each memdev under a cxl
>>>>>>> region. cxl driver needs to guarantee the new scrub_cycle value is
>>>>>>> greater than the min_scrub_cycle value of a memdev, otherwise the
>>>>>>> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>>>> 8.2.10.9.11.1).
>>>>>>> Current implementation logic of getting the min_scrub_cycle value
>>>>>>> of a cxl region is that getting the min_scrub_cycle value of each
>>>>>>> memdevs under the cxl region, then using the minimum
>>>>>>> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>>>>>>> the new scrub_cycle value is greater than this value. If yes,
>>>>>>> updating the new scrub_cycle value to each memdevs. The issue is
>>>>>>> that the new scrub_cycle value is possibly greater than the
>>>>>>> minimum min_scrub_cycle value of all memdevs but less than the
>>>>>>> maximum min_scrub_cycle value of all memdevs if memdevs have a
>>>>>>> different min_scrub_cycle value. The updating operation will
>>>>>>> always fail on these memdevs which have a greater min_scrub_cycle
>>>>>>> than the new
>>>> scrub_cycle.
>>>>>>> The correct implementation logic is to get the maximum value of
>>>>>>> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>>>>>>> is greater than the value. If yes, the new scrub_cycle value is
>>>>>>> fit for the
>>>> region.
>>>>>>> The change also impacts the result of
>>>>>>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>>>>>>> minimum min_scrub_cycle value among all memdevs under the region
>>>>>>> before the change. The interface will return the maximum
>>>>>>> min_scrub_cycle value among all memdevs under the region with the
>>>> change.
>>>>>>> Signed-off-by: Li Ming <ming.li@...omail.com>
>>>>>>> ---
>>>>>>> I made this change based on my understanding on the SPEC and
>>>>>>> current CXL EDAC code, but I am not sure if it is a bug or it is
>>>>>>> designed this
>>>> way.
>>>>>> The attribute is defined to show (per
>>>>>> Documentation/ABI/testing/sysfs-edac-
>>>>>> scrub)
>>>>>> "Supported minimum scrub cycle duration in seconds by the memory
>>>>>> scrubber."
>>>>>>
>>>>>> Your fix, making the min the max of the mins, looks needed.
>>>>>>
>>>>>> I took a look at the max attribute. If the min is the max on the
>>>>>> mins, then the max should be the max of the maxes. But, not true.
>>>>>> We do
>>>> this:
>>>>>> instead: *max = U8_MAX * 3600; /* Max set by register size */
>>>>>>
>>>>>> The comment isn't helping me, esp since the sysfs description
>>>>>> doesn't explain that we are using a constant max.
>>>>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>>>>> Readable Attributes does not define a field for "max scrub cycle
>>>>> supported". Thus for max scrub cycle, returning max value of
>>>>> (U8_MAX) of
>>>> patrol scrub cycle field.
>>>>
>>>> Understand that now, thanks. I'm still wondering if both these
>>>> deserve more explanation in
>>>> Documentation/ABI/testing/sysfs-edac-scrub
>>>> explaining the calculations. Like if the device represents an
>>>> aggregate of devices, like a region, the min scrub cycle is the max
>>>> of the mins, whereas if the device is a single, it's exactly what the
>>>> device returned. And for max, explaining what you replied above.
>>> Not sure is it appropriate to add these CXL scrub specific details to the generic
>> file
>>> Documentation/ABI/testing/sysfs-edac-scrub?
>>>
>>> CXL region specific details were added under section 1.2. Region based
>>> scrubbing of Documentation/edac/scrub.rst. May be better add these
>>> details for CXL specific min and max scrub cycle calculation to the
>> Documentation/edac/scrub.rst?
>>> How do you want to post these suggested doc changes, in a follow-up patch
>> now?
>>> Thanks,
>>> Shiju
>> I can include the doc changes in next version.
> Thanks Ming.
>
> May be like this?
>
> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
> index c43be90deab4..ab6014743da5 100644
> --- a/Documentation/ABI/testing/sysfs-edac-scrub
> +++ b/Documentation/ABI/testing/sysfs-edac-scrub
> @@ -49,6 +49,12 @@ Description:
> (RO) Supported minimum scrub cycle duration in seconds
> by the memory scrubber.
>
> + Device-based scrub: returns the minimum scrub cycle
> + supported by the memory device.
> +
> + Region-based scrub: returns the max of minimum scrub cycles
> + supported by individual memory devices that back the region.
> +
> What: /sys/bus/edac/devices/<dev-name>/scrubX/max_cycle_duration
> Date: March 2025
> KernelVersion: 6.15
> @@ -57,6 +63,16 @@ Description:
> (RO) Supported maximum scrub cycle duration in seconds
> by the memory scrubber.
>
> + Device-based scrub: returns the maximum scrub cycle supported
> + by the memory device.
> +
> + Region-based scrub: returns the min of maximum scrub cycles
> + supported by individual memory devices that back the region.
> +
> + If the memory device does not provide maximum scrub cycle
> + information, return the maximum supported value of the scrub
> + cycle field.
> +
> What: /sys/bus/edac/devices/<dev-name>/scrubX/current_cycle_duration
> Date: March 2025
> KernelVersion: 6.15
>
Sure, will do it, thanks.
Ming
>>
>> Thanks
>>
>> Ming
>>
>>
>>>> Regardless of this noise I'm making about the Docs.. I think Ming
>>>> should go ahead and v1 the fix for the min calc.
>>>>
>>>> --Alison
>>>>
>>>>> Thanks,
>>>>> Shiju
>>>>>>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>>>>>>> ---
>>>>>>> drivers/cxl/core/edac.c | 8 ++++++--
>>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>>>>>>> index
>>>>>>> 2cbc664e5d62..ad243cfe00e7 100644
>>>>>>> --- a/drivers/cxl/core/edac.c
>>>>>>> +++ b/drivers/cxl/core/edac.c
>>>>>>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>>> u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>>>>>> {
>>>>>>> struct cxl_mailbox *cxl_mbox;
>>>>>>> - u8 min_scrub_cycle = U8_MAX;
>>>>>>> struct cxl_region_params *p;
>>>>>>> struct cxl_memdev *cxlmd;
>>>>>>> struct cxl_region *cxlr;
>>>>>>> + u8 min_scrub_cycle = 0;
>>>>>>> int i, ret;
>>>>>>>
>>>>>>> if (!cxl_ps_ctx->cxlr) {
>>>>>>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>>>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>>>> if (ret)
>>>>>>> return ret;
>>>>>>>
>>>>>>> + /*
>>>>>>> + * The min_scrub_cycle of a region is the maximum
>> value
>>>>>> among
>>>>>>> + * the min_scrub_cycle of all the memdevs under the
>> region.
>>>>>>> + */
>>>>>>> if (min_cycle)
>>>>>>> - min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>>>>>>> + min_scrub_cycle = max(*min_cycle,
>> min_scrub_cycle);
>>>>>>> }
>>>>>>>
>>>>>>> if (min_cycle)
>>>>>>> --
>>>>>>> 2.34.1
>>>>>>>
Powered by blists - more mailing lists