[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250829161529.0000220a@huawei.com>
Date: Fri, 29 Aug 2025 16:15:29 +0100
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Dave Jiang <dave.jiang@...el.com>
CC: <linux-cxl@...r.kernel.org>, <linux-acpi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <gregkh@...uxfoundation.org>,
<rafael@...nel.org>, <dakr@...nel.org>, <dave@...olabs.net>,
<alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
<ira.weiny@...el.com>, <dan.j.williams@...el.com>,
<marc.herbert@...ux.intel.com>, <akpm@...ux-foundation.org>,
<david@...hat.com>, <stable@...r.kernel.org>
Subject: Re: [PATCH v2 3/4] cxl, acpi/hmat: Update CXL access coordinates
directly instead of through HMAT
On Wed, 20 Aug 2025 12:47:03 -0700
Dave Jiang <dave.jiang@...el.com> wrote:
> The current implementation of CXL memory hotplug notifier gets called
> before the HMAT memory hotplug notifier. The CXL driver calculates the
> access coordinates (bandwidth and latency values) for the CXL end to
> end path (i.e. CPU to endpoint). When the CXL region is onlined, the CXL
> memory hotplug notifier writes the access coordinates to the HMAT target
> structs. Then the HMAT memory hotplug notifier is called and it creates
> the access coordinates for the node sysfs attributes.
>
> During testing on an Intel platform, it was found that although the
> newly calculated coordinates were pushed to sysfs, the sysfs attributes for
> the access coordinates showed up with the wrong initiator. The system has
> 4 nodes (0, 1, 2, 3) where node 0 and 1 are CPU nodes and node 2 and 3 are
> CXL nodes. The expectation is that node 2 would show up as a target to node
> 0:
> /sys/devices/system/node/node2/access0/initiators/node0
>
> However it was observed that node 2 showed up as a target under node 1:
> /sys/devices/system/node/node2/access0/initiators/node1
>
> The original intent of the 'ext_updated' flag in HMAT handling code was to
> stop HMAT memory hotplug callback from clobbering the access coordinates
> after CXL has injected its calculated coordinates and replaced the generic
> target access coordinates provided by the HMAT table in the HMAT target
> structs. However the flag is hacky at best and blocks the updates from
> other CXL regions that are onlined in the same node later on. Remove the
> 'ext_updated' flag usage and just update the access coordinates for the
> nodes directly without touching HMAT target data.
>
> The hotplug memory callback ordering is changed. Instead of changing CXL,
> move HMAT back so there's room for the levels rather than have CXL share
> the same level as SLAB_CALLBACK_PRI. The change will resulting in the CXL
> callback to be executed after the HMAT callback.
>
> With the change, the CXL hotplug memory notifier runs after the HMAT
> callback. The HMAT callback will create the node sysfs attributes for
> access coordinates. The CXL callback will write the access coordinates to
> the now created node sysfs attributes directly and will not pollute the
> HMAT target values.
>
> Fixes: 067353a46d8c ("cxl/region: Add memory hotplug notifier for cxl region")
> Cc: stable@...r.kernel.org
> Tested-by: Marc Herbert <marc.herbert@...ux.intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@...el.com>
> Signed-off-by: Dave Jiang <dave.jiang@...el.com>
> ---
> v2:
> - Add description to what was observed for the issue. (Dan)
> - Use the correct Fixes tag. (Dan)
> - Add Cc to stable. (Dan)
> - Add support to only update on first region appearance. (Jonathan)
The implementation of this seems like overkill. See later, but in short
should be able to use a nodemask_t
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 71cc42d05248..371873fc43eb 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -9,6 +9,7 @@
> #include <linux/uuid.h>
> #include <linux/sort.h>
> #include <linux/idr.h>
> +#include <linux/xarray.h>
> #include <linux/memory-tiers.h>
> #include <cxlmem.h>
> #include <cxl.h>
> @@ -30,6 +31,9 @@
> * 3. Decoder targets
> */
>
> +/* xarray that stores the reference count per node for regions */
Reference count seems an over the top description.
I think it's just a flag that says if this happened already or not.
The term reference count would kind of suggest it would count regions
present.
So perhaps a comment along the lines of
/* xarray that stores if a region has previously been seen in a node */
and rename to node_region_seen_xa.
However, can we just use a bitmap sized to MAX_NUMNODES which is
a nodemask_t which gives us the helpers nodemask.h
> +static DEFINE_XARRAY(node_regions_xa);
> +
> static struct cxl_region *to_cxl_region(struct device *dev);
>
> #define __ACCESS_ATTR_RO(_level, _name) { \
> @@ -2442,14 +2446,8 @@ static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid)
>
> for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) {
> if (cxlr->coord[i].read_bandwidth) {
> - rc = 0;
> - if (cxl_need_node_perf_attrs_update(nid))
> - node_set_perf_attrs(nid, &cxlr->coord[i], i);
> - else
> - rc = cxl_update_hmat_access_coordinates(nid, cxlr, i);
> -
> - if (rc == 0)
> - cset++;
> + node_update_perf_attrs(nid, &cxlr->coord[i], i);
> + cset++;
> }
> }
>
> @@ -2475,6 +2473,7 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
> struct node_notify *nn = arg;
> int nid = nn->nid;
> int region_nid;
> + int rc;
>
> if (action != NODE_ADDED_FIRST_MEMORY)
> return NOTIFY_DONE;
> @@ -2487,6 +2486,11 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
> if (nid != region_nid)
> return NOTIFY_DONE;
>
> + /* No action needed if there's existing entry */
> + rc = xa_insert(&node_regions_xa, nid, NULL, GFP_KERNEL);
So this is using the NULL entry quirk of xa_insert() where
a reserved entry is put in place so next time we match.
> + if (rc < 0)
> + return NOTIFY_DONE;
> +
> if (!cxl_region_update_coordinates(cxlr, nid))
> return NOTIFY_DONE;
>
> @@ -3638,6 +3642,7 @@ int cxl_region_init(void)
>
> void cxl_region_exit(void)
> {
> + xa_destroy(&node_regions_xa);
> cxl_driver_unregister(&cxl_region_driver);
> }
Powered by blists - more mailing lists