[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240823170924.00002456@Huawei.com>
Date: Fri, 23 Aug 2024 17:09:24 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: <ira.weiny@...el.com>
CC: Dave Jiang <dave.jiang@...el.com>, Fan Ni <fan.ni@...sung.com>, "Navneet
Singh" <navneet.singh@...el.com>, Chris Mason <clm@...com>, Josef Bacik
<josef@...icpanda.com>, David Sterba <dsterba@...e.com>, Petr Mladek
<pmladek@...e.com>, Steven Rostedt <rostedt@...dmis.org>, Andy Shevchenko
<andriy.shevchenko@...ux.intel.com>, Rasmus Villemoes
<linux@...musvillemoes.dk>, Sergey Senozhatsky <senozhatsky@...omium.org>,
Jonathan Corbet <corbet@....net>, Andrew Morton <akpm@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>, Davidlohr Bueso <dave@...olabs.net>,
Alison Schofield <alison.schofield@...el.com>, Vishal Verma
<vishal.l.verma@...el.com>, <linux-btrfs@...r.kernel.org>,
<linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <nvdimm@...ts.linux.dev>
Subject: Re: [PATCH v3 09/25] cxl/hdm: Add dynamic capacity size support to
endpoint decoders
On Fri, 16 Aug 2024 09:44:17 -0500
ira.weiny@...el.com wrote:
> From: Navneet Singh <navneet.singh@...el.com>
>
> To support Dynamic Capacity Devices (DCD) endpoint decoders will need to
> map DC partitions (regions). In addition to assigning the size of the
> DC partition, the decoder must assign any skip value from the previous
> decoder. This must be done within a contiguous DPA space.
>
> Two complications arise with Dynamic Capacity regions which did not
> exist with Ram and PMEM partitions. First, gaps in the DPA space can
> exist between and around the DC partitions. Second, the Linux resource
> tree does not allow a resource to be marked across existing nodes within
> a tree.
>
> For clarity, below is an example of an 60GB device with 10GB of RAM,
> 10GB of PMEM and 10GB for each of 2 DC partitions. The desired CXL
> mapping is 5GB of RAM, 5GB of PMEM, and 5GB of DC1.
>
> DPA RANGE
> (dpa_res)
> 0GB 10GB 20GB 30GB 40GB 50GB 60GB
> |----------|----------|----------|----------|----------|----------|
>
> RAM PMEM DC0 DC1
> (ram_res) (pmem_res) (dc_res[0]) (dc_res[1])
> |----------|----------| <gap> |----------| <gap> |----------|
>
> RAM PMEM DC1
> |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----|
> 0GB 5GB 10GB 15GB 20GB 30GB 40GB 50GB 60GB
>
> The previous skip resource between RAM and PMEM was always a child of
> the RAM resource and fit nicely [see (S) below]. Because of this
> simplicity this skip resource reference was not stored in any CXL state.
> On release the skip range could be calculated based on the endpoint
> decoders stored values.
>
> Now when DC1 is being mapped 4 skip resources must be created as
> children. One for the PMEM resource (A), two of the parent DPA resource
> (B,D), and one more child of the DC0 resource (C).
>
> 0GB 10GB 20GB 30GB 40GB 50GB 60GB
> |----------|----------|----------|----------|----------|----------|
> | |
> |----------|----------| | |----------| | |----------|
> | | | | |
> (S) (A) (B) (C) (D)
> v v v v v
> |XXXXX|----|XXXXX|----|----------|----------|----------|XXXXX-----|
> skip skip skip skip skip
>
> Expand the calculation of DPA free space and enhance the logic to
> support this more complex skipping. To track the potential of multiple
> skip resources an xarray is attached to the endpoint decoder. The
> existing algorithm between RAM and PMEM is consolidated within the new
> one to streamline the code even though the result is the storage of a
> single skip resource in the xarray.
>
> Signed-off-by: Navneet Singh <navneet.singh@...el.com>
> Co-developed-by: Ira Weiny <ira.weiny@...el.com>
> Signed-off-by: Ira Weiny <ira.weiny@...el.com>
>
One query below + request to add a comment on it for when I've
again completely forgotten how this works.
Also a grumpy reviewer comment.
> +static int cxl_reserve_dpa_skip(struct cxl_endpoint_decoder *cxled,
> + resource_size_t base, resource_size_t skipped)
> +{
> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_port *port = cxled_to_port(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + resource_size_t skip_base = base - skipped;
> + struct device *dev = &port->dev;
> + resource_size_t skip_len = 0;
> + int rc, index;
> +
> + index = dc_mode_to_region_index(cxled->mode);
> + for (int i = 0; i <= index; i++) {
I'm not sure why this is <= so maybe a comment?
> + struct resource *dcr = &cxlds->dc_res[i];
> +
> + if (skip_base < dcr->start) {
> + skip_len = dcr->start - skip_base;
> + rc = cxl_request_skip(cxled, skip_base, skip_len);
> + if (rc)
> + return rc;
> + skip_base += skip_len;
> + }
> +
> + if (skip_base == base) {
> + dev_dbg(dev, "skip done DC region %d!\n", i);
> + break;
> + }
> +
> + if (resource_size(dcr) && skip_base <= dcr->end) {
> + if (skip_base > base) {
> + dev_err(dev, "Skip error DC region %d; skip_base %pa; base %pa\n",
> + i, &skip_base, &base);
> + return -ENXIO;
> + }
> +
> + skip_len = dcr->end - skip_base + 1;
> + rc = cxl_request_skip(cxled, skip_base, skip_len);
> + if (rc)
> + return rc;
> + skip_base += skip_len;
> + }
> + }
> +
> + return 0;
> +}
> @@ -466,8 +588,8 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled,
>
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> {
> - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> resource_size_t free_ram_start, free_pmem_start;
> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
Patch noise. Put it back where it was! (assuming I haven't failed to spot the difference)
> struct cxl_port *port = cxled_to_port(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> struct device *dev = &cxled->cxld.dev;
Powered by blists - more mailing lists