[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc31c6ab-d147-10c0-7678-d820bc8ec96e@redhat.com>
Date: Wed, 28 Jul 2021 22:17:17 +0200
From: David Hildenbrand <david@...hat.com>
To: Jia He <justin.he@....com>,
Dan Williams <dan.j.williams@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Dave Jiang <dave.jiang@...el.com>
Cc: nvdimm@...ts.linux.dev, linux-kernel@...r.kernel.org, nd@....com
Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is invalid
On 28.07.21 10:22, Jia He wrote:
> Previously, numa_off was set unconditionally in dummy_numa_init()
> even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1)
> after acpi_map_pxm_to_node() because it regards numa_off as turning
> off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> arm64 with fake numa.
>
> Without this patch, pmem can't be probed as a RAM device on arm64 if
> SRAT table isn't present:
> $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
> kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
> kmem: probe of dax0.0 failed with error -22
>
> This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
>
> Suggested-by: David Hildenbrand <david@...hat.com>
> Signed-off-by: Jia He <justin.he@....com>
> ---
> drivers/dax/kmem.c | 36 ++++++++++++++++++++----------------
> 1 file changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index ac231cc36359..749674909e51 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> struct dax_kmem_data *data;
> int rc = -ENOMEM;
> int i, mapped = 0;
> - int numa_node;
> -
> - /*
> - * Ensure good NUMA information for the persistent memory.
> - * Without this check, there is a risk that slow memory
> - * could be mixed in a node with faster memory, causing
> - * unavoidable performance issues.
> - */
> - numa_node = dev_dax->target_node;
> - if (numa_node < 0) {
> - dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
> - numa_node);
> - return -EINVAL;
> - }
> + int numa_node = dev_dax->target_node, new_node;
>
> data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
> if (!data)
> @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> */
> res->flags = IORESOURCE_SYSTEM_RAM;
>
> + /*
> + * Ensure good NUMA information for the persistent memory.
> + * Without this check, there is a risk but not fatal that slow
> + * memory could be mixed in a node with faster memory, causing
> + * unavoidable performance issues. Furthermore, fallback node
> + * id can be used when numa_node is invalid.
> + */
> + if (numa_node < 0) {
> + new_node = memory_add_physaddr_to_nid(range.start);
> + dev_info(dev, "changing nid from %d to %d for DAX region %pR\n",
> + numa_node, new_node, res);
> + numa_node = new_node;
> + }
> +
> /*
> * Ensure that future kexec'd kernels will not treat
> * this as RAM automatically.
> @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
> int i, success = 0;
> struct device *dev = &dev_dax->dev;
> struct dax_kmem_data *data = dev_get_drvdata(dev);
> + int numa_node = dev_dax->target_node;
>
> /*
> * We have one shot for removing memory, if some memory blocks were not
> @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
> if (rc)
> continue;
>
> - rc = remove_memory(dev_dax->target_node, range.start,
> - range_len(&range));
> + if (numa_node < 0)
> + numa_node = memory_add_physaddr_to_nid(range.start);
> +
> + rc = remove_memory(numa_node, range.start, range_len(&range));
> if (rc == 0) {
> release_resource(data->res[i]);
> kfree(data->res[i]);
>
Note that this patch conflicts with:
https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
But nothing fundamental. Determining a single NID is similar to how I'm
handling it for ACPI:
https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists