[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96fedb6e-6184-4609-b249-4289f61d5003@zohomail.com>
Date: Tue, 3 Jun 2025 07:57:15 +0800
From: Li Ming <ming.li@...omail.com>
To: Shiju Jose <shiju.jose@...wei.com>,
Alison Schofield <alison.schofield@...el.com>
Cc: "dave@...olabs.net" <dave@...olabs.net>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
"dave.jiang@...el.com" <dave.jiang@...el.com>,
"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
"ira.weiny@...el.com" <ira.weiny@...el.com>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
miscalculation
On 6/3/2025 1:25 AM, Shiju Jose wrote:
>> -----Original Message-----
>> From: Alison Schofield <alison.schofield@...el.com>
>> Sent: 02 June 2025 17:48
>> To: Shiju Jose <shiju.jose@...wei.com>
>> Cc: Li Ming <ming.li@...omail.com>; dave@...olabs.net; Jonathan Cameron
>> <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>> vishal.l.verma@...el.com; ira.weiny@...el.com; dan.j.williams@...el.com; linux-
>> cxl@...r.kernel.org; linux-kernel@...r.kernel.org
>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>> miscalculation
>>
>> On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>>>> -----Original Message-----
>>>> From: Alison Schofield <alison.schofield@...el.com>
>>>> Sent: 30 May 2025 19:28
>>>> To: Li Ming <ming.li@...omail.com>
>>>> Cc: dave@...olabs.net; Jonathan Cameron
>>>> <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>>>> vishal.l.verma@...el.com; ira.weiny@...el.com;
>>>> dan.j.williams@...el.com; Shiju Jose <shiju.jose@...wei.com>; linux-
>>>> cxl@...r.kernel.org; linux-kernel@...r.kernel.org
>>>> Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>>>> region miscalculation
>>>>
>>>> On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>>>>> When trying to update the scrub_cycle value of a cxl region, which
>>>>> means updating the scrub_cycle value of each memdev under a cxl
>>>>> region. cxl driver needs to guarantee the new scrub_cycle value is
>>>>> greater than the min_scrub_cycle value of a memdev, otherwise the
>>>>> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>> 8.2.10.9.11.1).
>>>>> Current implementation logic of getting the min_scrub_cycle value
>>>>> of a cxl region is that getting the min_scrub_cycle value of each
>>>>> memdevs under the cxl region, then using the minimum
>>>>> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>>>>> the new scrub_cycle value is greater than this value. If yes,
>>>>> updating the new scrub_cycle value to each memdevs. The issue is
>>>>> that the new scrub_cycle value is possibly greater than the minimum
>>>>> min_scrub_cycle value of all memdevs but less than the maximum
>>>>> min_scrub_cycle value of all memdevs if memdevs have a different
>>>>> min_scrub_cycle value. The updating operation will always fail on
>>>>> these memdevs which have a greater min_scrub_cycle than the new
>> scrub_cycle.
>>>>> The correct implementation logic is to get the maximum value of
>>>>> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>>>>> is greater than the value. If yes, the new scrub_cycle value is fit for the
>> region.
>>>>> The change also impacts the result of
>>>>> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>>>>> minimum min_scrub_cycle value among all memdevs under the region
>>>>> before the change. The interface will return the maximum
>>>>> min_scrub_cycle value among all memdevs under the region with the
>> change.
>>>>> Signed-off-by: Li Ming <ming.li@...omail.com>
>>>>> ---
>>>>> I made this change based on my understanding on the SPEC and
>>>>> current CXL EDAC code, but I am not sure if it is a bug or it is designed this
>> way.
>>>> The attribute is defined to show (per
>>>> Documentation/ABI/testing/sysfs-edac-
>>>> scrub)
>>>> "Supported minimum scrub cycle duration in seconds by the memory
>>>> scrubber."
>>>>
>>>> Your fix, making the min the max of the mins, looks needed.
>>>>
>>>> I took a look at the max attribute. If the min is the max on the
>>>> mins, then the max should be the max of the maxes. But, not true. We do
>> this:
>>>> instead: *max = U8_MAX * 3600; /* Max set by register size */
>>>>
>>>> The comment isn't helping me, esp since the sysfs description doesn't
>>>> explain that we are using a constant max.
>>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>>> Readable Attributes does not define a field for "max scrub cycle
>>> supported". Thus for max scrub cycle, returning max value of (U8_MAX) of
>> patrol scrub cycle field.
>>
>> Understand that now, thanks. I'm still wondering if both these deserve more
>> explanation in Documentation/ABI/testing/sysfs-edac-scrub
>> explaining the calculations. Like if the device represents an aggregate of
>> devices, like a region, the min scrub cycle is the max of the mins, whereas if the
>> device is a single, it's exactly what the device returned. And for max, explaining
>> what you replied above.
> Not sure is it appropriate to add these CXL scrub specific details to the generic file
> Documentation/ABI/testing/sysfs-edac-scrub?
>
> CXL region specific details were added under section 1.2. Region based scrubbing
> of Documentation/edac/scrub.rst. May be better add these details for CXL specific
> min and max scrub cycle calculation to the Documentation/edac/scrub.rst?
>
> How do you want to post these suggested doc changes, in a follow-up patch now?
>
> Thanks,
> Shiju
I can include the doc changes in next version.
Thanks
Ming
>> Regardless of this noise I'm making about the Docs.. I think Ming should go
>> ahead and v1 the fix for the min calc.
>>
>> --Alison
>>
>>> Thanks,
>>> Shiju
>>>>
>>>>> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>>>>> ---
>>>>> drivers/cxl/core/edac.c | 8 ++++++--
>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>>>>> index
>>>>> 2cbc664e5d62..ad243cfe00e7 100644
>>>>> --- a/drivers/cxl/core/edac.c
>>>>> +++ b/drivers/cxl/core/edac.c
>>>>> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>> u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>>>> {
>>>>> struct cxl_mailbox *cxl_mbox;
>>>>> - u8 min_scrub_cycle = U8_MAX;
>>>>> struct cxl_region_params *p;
>>>>> struct cxl_memdev *cxlmd;
>>>>> struct cxl_region *cxlr;
>>>>> + u8 min_scrub_cycle = 0;
>>>>> int i, ret;
>>>>>
>>>>> if (!cxl_ps_ctx->cxlr) {
>>>>> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>>>> cxl_patrol_scrub_context *cxl_ps_ctx,
>>>>> if (ret)
>>>>> return ret;
>>>>>
>>>>> + /*
>>>>> + * The min_scrub_cycle of a region is the maximum value
>>>> among
>>>>> + * the min_scrub_cycle of all the memdevs under the region.
>>>>> + */
>>>>> if (min_cycle)
>>>>> - min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>>>>> + min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
>>>>> }
>>>>>
>>>>> if (min_cycle)
>>>>> --
>>>>> 2.34.1
>>>>>
Powered by blists - more mailing lists