[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d036c22fc0e348a3ba7ecb0b4dbf35ba@huawei.com>
Date: Mon, 2 Jun 2025 17:25:11 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Alison Schofield <alison.schofield@...el.com>
CC: Li Ming <ming.li@...omail.com>, "dave@...olabs.net" <dave@...olabs.net>,
Jonathan Cameron <jonathan.cameron@...wei.com>, "dave.jiang@...el.com"
<dave.jiang@...el.com>, "vishal.l.verma@...el.com"
<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
miscalculation
>-----Original Message-----
>From: Alison Schofield <alison.schofield@...el.com>
>Sent: 02 June 2025 17:48
>To: Shiju Jose <shiju.jose@...wei.com>
>Cc: Li Ming <ming.li@...omail.com>; dave@...olabs.net; Jonathan Cameron
><jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>vishal.l.verma@...el.com; ira.weiny@...el.com; dan.j.williams@...el.com; linux-
>cxl@...r.kernel.org; linux-kernel@...r.kernel.org
>Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a region
>miscalculation
>
>On Mon, Jun 02, 2025 at 08:23:34AM +0000, Shiju Jose wrote:
>> >-----Original Message-----
>> >From: Alison Schofield <alison.schofield@...el.com>
>> >Sent: 30 May 2025 19:28
>> >To: Li Ming <ming.li@...omail.com>
>> >Cc: dave@...olabs.net; Jonathan Cameron
>> ><jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>> >vishal.l.verma@...el.com; ira.weiny@...el.com;
>> >dan.j.williams@...el.com; Shiju Jose <shiju.jose@...wei.com>; linux-
>> >cxl@...r.kernel.org; linux-kernel@...r.kernel.org
>> >Subject: Re: [RFC PATCH 1/1] cxl/edac: Fix the min_scrub_cycle of a
>> >region miscalculation
>> >
>> >On Fri, May 30, 2025 at 08:28:52PM +0800, Li Ming wrote:
>> >> When trying to update the scrub_cycle value of a cxl region, which
>> >> means updating the scrub_cycle value of each memdev under a cxl
>> >> region. cxl driver needs to guarantee the new scrub_cycle value is
>> >> greater than the min_scrub_cycle value of a memdev, otherwise the
>> >> updating operation will fail(Per Table 8-223 in CXL r3.2 section
>8.2.10.9.11.1).
>> >>
>> >> Current implementation logic of getting the min_scrub_cycle value
>> >> of a cxl region is that getting the min_scrub_cycle value of each
>> >> memdevs under the cxl region, then using the minimum
>> >> min_scrub_cycle value as the region's min_scrub_cycle. Checking if
>> >> the new scrub_cycle value is greater than this value. If yes,
>> >> updating the new scrub_cycle value to each memdevs. The issue is
>> >> that the new scrub_cycle value is possibly greater than the minimum
>> >> min_scrub_cycle value of all memdevs but less than the maximum
>> >> min_scrub_cycle value of all memdevs if memdevs have a different
>> >> min_scrub_cycle value. The updating operation will always fail on
>> >> these memdevs which have a greater min_scrub_cycle than the new
>scrub_cycle.
>> >>
>> >> The correct implementation logic is to get the maximum value of
>> >> these memdevs' min_scrub_cycle, check if the new scrub_cycle value
>> >> is greater than the value. If yes, the new scrub_cycle value is fit for the
>region.
>> >>
>> >> The change also impacts the result of
>> >> cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the
>> >> minimum min_scrub_cycle value among all memdevs under the region
>> >> before the change. The interface will return the maximum
>> >> min_scrub_cycle value among all memdevs under the region with the
>change.
>> >>
>> >> Signed-off-by: Li Ming <ming.li@...omail.com>
>> >> ---
>> >> I made this change based on my understanding on the SPEC and
>> >> current CXL EDAC code, but I am not sure if it is a bug or it is designed this
>way.
>> >
>> >The attribute is defined to show (per
>> >Documentation/ABI/testing/sysfs-edac-
>> >scrub)
>> > "Supported minimum scrub cycle duration in seconds by the memory
>> >scrubber."
>> >
>> >Your fix, making the min the max of the mins, looks needed.
>> >
>> >I took a look at the max attribute. If the min is the max on the
>> >mins, then the max should be the max of the maxes. But, not true. We do
>this:
>> >
>> >instead: *max = U8_MAX * 3600; /* Max set by register size */
>> >
>> >The comment isn't helping me, esp since the sysfs description doesn't
>> >explain that we are using a constant max.
>> CXL spec r3.2 Table 8-222. Device Patrol Scrub Control Feature
>> Readable Attributes does not define a field for "max scrub cycle
>> supported". Thus for max scrub cycle, returning max value of (U8_MAX) of
>patrol scrub cycle field.
>
>Understand that now, thanks. I'm still wondering if both these deserve more
>explanation in Documentation/ABI/testing/sysfs-edac-scrub
>explaining the calculations. Like if the device represents an aggregate of
>devices, like a region, the min scrub cycle is the max of the mins, whereas if the
>device is a single, it's exactly what the device returned. And for max, explaining
>what you replied above.
Not sure is it appropriate to add these CXL scrub specific details to the generic file
Documentation/ABI/testing/sysfs-edac-scrub?
CXL region specific details were added under section 1.2. Region based scrubbing
of Documentation/edac/scrub.rst. May be better add these details for CXL specific
min and max scrub cycle calculation to the Documentation/edac/scrub.rst?
How do you want to post these suggested doc changes, in a follow-up patch now?
Thanks,
Shiju
>
>Regardless of this noise I'm making about the Docs.. I think Ming should go
>ahead and v1 the fix for the min calc.
>
>--Alison
>
>>
>> Thanks,
>> Shiju
>> >
>> >
>> >>
>> >> base-commit: 9f153b7fb5ae45c7d426851f896487927f40e501 cxl/next
>> >> ---
>> >> drivers/cxl/core/edac.c | 8 ++++++--
>> >> 1 file changed, 6 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c
>> >> index
>> >> 2cbc664e5d62..ad243cfe00e7 100644
>> >> --- a/drivers/cxl/core/edac.c
>> >> +++ b/drivers/cxl/core/edac.c
>> >> @@ -103,10 +103,10 @@ static int cxl_scrub_get_attrbs(struct
>> >cxl_patrol_scrub_context *cxl_ps_ctx,
>> >> u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle)
>> >{
>> >> struct cxl_mailbox *cxl_mbox;
>> >> - u8 min_scrub_cycle = U8_MAX;
>> >> struct cxl_region_params *p;
>> >> struct cxl_memdev *cxlmd;
>> >> struct cxl_region *cxlr;
>> >> + u8 min_scrub_cycle = 0;
>> >> int i, ret;
>> >>
>> >> if (!cxl_ps_ctx->cxlr) {
>> >> @@ -133,8 +133,12 @@ static int cxl_scrub_get_attrbs(struct
>> >cxl_patrol_scrub_context *cxl_ps_ctx,
>> >> if (ret)
>> >> return ret;
>> >>
>> >> + /*
>> >> + * The min_scrub_cycle of a region is the maximum value
>> >among
>> >> + * the min_scrub_cycle of all the memdevs under the region.
>> >> + */
>> >> if (min_cycle)
>> >> - min_scrub_cycle = min(*min_cycle, min_scrub_cycle);
>> >> + min_scrub_cycle = max(*min_cycle, min_scrub_cycle);
>> >> }
>> >>
>> >> if (min_cycle)
>> >> --
>> >> 2.34.1
>> >>
>>
Powered by blists - more mailing lists