[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d6525c1-2e8b-0e5d-7dae-193bf697a4ec@linux.intel.com>
Date: Thu, 20 Sep 2018 18:33:26 -0700
From: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Linux MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-nvdimm <linux-nvdimm@...ts.01.org>,
Pasha Tatashin <pavel.tatashin@...rosoft.com>,
Michal Hocko <mhocko@...e.com>,
Dave Jiang <dave.jiang@...el.com>,
Ingo Molnar <mingo@...nel.org>,
Dave Hansen <dave.hansen@...el.com>,
Jérôme Glisse <jglisse@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Logan Gunthorpe <logang@...tatee.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH v4 5/5] nvdimm: Schedule device registration on node local
to the device
On 9/20/2018 5:36 PM, Dan Williams wrote:
> On Thu, Sep 20, 2018 at 5:26 PM Alexander Duyck
> <alexander.h.duyck@...ux.intel.com> wrote:
>>
>> On 9/20/2018 3:59 PM, Dan Williams wrote:
>>> On Thu, Sep 20, 2018 at 3:31 PM Alexander Duyck
>>> <alexander.h.duyck@...ux.intel.com> wrote:
>>>>
>>>> This patch is meant to force the device registration for nvdimm devices to
>>>> be closer to the actual device. This is achieved by using either the NUMA
>>>> node ID of the region, or of the parent. By doing this we can have
>>>> everything above the region based on the region, and everything below the
>>>> region based on the nvdimm bus.
>>>>
>>>> One additional change I made is that we hold onto a reference to the parent
>>>> while we are going through registration. By doing this we can guarantee we
>>>> can complete the registration before we have the parent device removed.
>>>>
>>>> By guaranteeing NUMA locality I see an improvement of as high as 25% for
>>>> per-node init of a system with 12TB of persistent memory.
>>>>
>>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
>>>> ---
>>>> drivers/nvdimm/bus.c | 19 +++++++++++++++++--
>>>> 1 file changed, 17 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
>>>> index 8aae6dcc839f..ca935296d55e 100644
>>>> --- a/drivers/nvdimm/bus.c
>>>> +++ b/drivers/nvdimm/bus.c
>>>> @@ -487,7 +487,9 @@ static void nd_async_device_register(void *d, async_cookie_t cookie)
>>>> dev_err(dev, "%s: failed\n", __func__);
>>>> put_device(dev);
>>>> }
>>>> +
>>>> put_device(dev);
>>>> + put_device(dev->parent);
>>>
>>> Good catch. The child does not pin the parent until registration, but
>>> we need to make sure the parent isn't gone while were waiting for the
>>> registration work to run.
>>>
>>> Let's break this reference count fix out into its own separate patch,
>>> because this looks to be covering a gap that may need to be
>>> recommended for -stable.
>>
>> Okay, I guess I can do that.
>>
>>>
>>>>
>>>> static void nd_async_device_unregister(void *d, async_cookie_t cookie)
>>>> @@ -504,12 +506,25 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie)
>>>>
>>>> void __nd_device_register(struct device *dev)
>>>> {
>>>> + int node;
>>>> +
>>>> if (!dev)
>>>> return;
>>>> +
>>>> dev->bus = &nvdimm_bus_type;
>>>> + get_device(dev->parent);
>>>> get_device(dev);
>>>> - async_schedule_domain(nd_async_device_register, dev,
>>>> - &nd_async_domain);
>>>> +
>>>> + /*
>>>> + * For a region we can break away from the parent node,
>>>> + * otherwise for all other devices we just inherit the node from
>>>> + * the parent.
>>>> + */
>>>> + node = is_nd_region(dev) ? to_nd_region(dev)->numa_node :
>>>> + dev_to_node(dev->parent);
>>>
>>> Devices already automatically inherit the node of their parent, so I'm
>>> not understanding why this is needed?
>>
>> That doesn't happen until you call device_add, which you don't call
>> until nd_async_device_register. All that has been called on the device
>> up to now is device_initialize which leaves the node at NUMA_NO_NODE.
>
> Ooh, yeah, missed that. I think I'd prefer this policy to moved out to
> where we set the dev->parent before calling __nd_device_register, or
> at least a comment here about *why* we know region devices are special
> (i.e. because the nd_region_desc specified the node at region creation
> time).
>
Are you talking about pulling the scheduling out or just adding a node
value to the nd_device_register call so it can be set directly from the
caller?
If you wanted what I could do is pull the set_dev_node call from
nvdimm_bus_uevent and place it in nd_device_register. That should stick
as the node doesn't get overwritten by the parent if it is set after
device_initialize. If I did that along with the parent bit I was already
doing then all that would be left to do in is just use the dev_to_node
call on the device itself.
Powered by blists - more mailing lists