lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR08MB437663A6F8ABE7FCBC22B4E0F7EB9@AM6PR08MB4376.eurprd08.prod.outlook.com>
Date:   Thu, 29 Jul 2021 00:20:38 +0000
From:   Justin He <Justin.He@....com>
To:     David Hildenbrand <david@...hat.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        Dave Jiang <dave.jiang@...el.com>
CC:     "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        nd <nd@....com>
Subject: RE: [PATCH] device-dax: use fallback nid when numa_node is invalid

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@...hat.com>
> Sent: Thursday, July 29, 2021 4:17 AM
> To: Justin He <Justin.He@....com>; Dan Williams <dan.j.williams@...el.com>;
> Vishal Verma <vishal.l.verma@...el.com>; Dave Jiang <dave.jiang@...el.com>
> Cc: nvdimm@...ts.linux.dev; linux-kernel@...r.kernel.org; nd <nd@....com>
> Subject: Re: [PATCH] device-dax: use fallback nid when numa_node is invalid
> 
> On 28.07.21 10:22, Jia He wrote:
> > Previously, numa_off was set unconditionally in dummy_numa_init()
> > even with a fake numa node. Then ACPI set node id as NUMA_NO_NODE(-1)
> > after acpi_map_pxm_to_node() because it regards numa_off as turning
> > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on
> > arm64 with fake numa.
> >
> > Without this patch, pmem can't be probed as a RAM device on arm64 if
> > SRAT table isn't present:
> >    $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> >    kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> >    kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> >
> > Suggested-by: David Hildenbrand <david@...hat.com>
> > Signed-off-by: Jia He <justin.he@....com>
> > ---
> >   drivers/dax/kmem.c | 36 ++++++++++++++++++++----------------
> >   1 file changed, 20 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index ac231cc36359..749674909e51 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -46,20 +46,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >   	struct dax_kmem_data *data;
> >   	int rc = -ENOMEM;
> >   	int i, mapped = 0;
> > -	int numa_node;
> > -
> > -	/*
> > -	 * Ensure good NUMA information for the persistent memory.
> > -	 * Without this check, there is a risk that slow memory
> > -	 * could be mixed in a node with faster memory, causing
> > -	 * unavoidable performance issues.
> > -	 */
> > -	numa_node = dev_dax->target_node;
> > -	if (numa_node < 0) {
> > -		dev_warn(dev, "rejecting DAX region with invalid node: %d\n",
> > -				numa_node);
> > -		return -EINVAL;
> > -	}
> > +	int numa_node = dev_dax->target_node, new_node;
> >
> >   	data = kzalloc(struct_size(data, res, dev_dax->nr_range),
> GFP_KERNEL);
> >   	if (!data)
> > @@ -104,6 +91,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> >   		 */
> >   		res->flags = IORESOURCE_SYSTEM_RAM;
> >
> > +		/*
> > +		 * Ensure good NUMA information for the persistent memory.
> > +		 * Without this check, there is a risk but not fatal that slow
> > +		 * memory could be mixed in a node with faster memory, causing
> > +		 * unavoidable performance issues. Furthermore, fallback node
> > +		 * id can be used when numa_node is invalid.
> > +		 */
> > +		if (numa_node < 0) {
> > +			new_node = memory_add_physaddr_to_nid(range.start);
> > +			dev_info(dev, "changing nid from %d to %d for DAX
> region %pR\n",
> > +				numa_node, new_node, res);
> > +			numa_node = new_node;
> > +		}
> > +
> >   		/*
> >   		 * Ensure that future kexec'd kernels will not treat
> >   		 * this as RAM automatically.
> > @@ -141,6 +142,7 @@ static void dev_dax_kmem_remove(struct dev_dax
> *dev_dax)
> >   	int i, success = 0;
> >   	struct device *dev = &dev_dax->dev;
> >   	struct dax_kmem_data *data = dev_get_drvdata(dev);
> > +	int numa_node = dev_dax->target_node;
> >
> >   	/*
> >   	 * We have one shot for removing memory, if some memory blocks were
> not
> > @@ -156,8 +158,10 @@ static void dev_dax_kmem_remove(struct dev_dax
> *dev_dax)
> >   		if (rc)
> >   			continue;
> >
> > -		rc = remove_memory(dev_dax->target_node, range.start,
> > -				range_len(&range));
> > +		if (numa_node < 0)
> > +			numa_node = memory_add_physaddr_to_nid(range.start);
> > +
> > +		rc = remove_memory(numa_node, range.start, range_len(&range));
> >   		if (rc == 0) {
> >   			release_resource(data->res[i]);
> >   			kfree(data->res[i]);
> >
> 
> Note that this patch conflicts with:
> 
> https://lkml.kernel.org/r/20210723125210.29987-7-david@redhat.com
> 
> But nothing fundamental. Determining a single NID is similar to how I'm
> handling it for ACPI:
> 
> https://lkml.kernel.org/r/20210723125210.29987-6-david@redhat.com
> 

Okay, got it. Thanks for the reminder.
Seems my patch is not useful after your patch.


--
Cheers,
Justin (Jia He)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ