[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200107214801.GN32178@dhcp22.suse.cz>
Date: Tue, 7 Jan 2020 22:48:01 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Scott Cheloha <cheloha@...ux.vnet.ibm.com>
Cc: linux-kernel@...r.kernel.org,
"Rafael J. Wysocki" <rafael@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
David Hildenbrand <david@...hat.com>, nathanl@...ux.ibm.com,
ricklind@...ux.vnet.ibm.com,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to
accelerate lookup
[Cc Andrew]
On Tue 17-12-19 13:32:38, Scott Cheloha wrote:
> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.
Noting that this is O(N^2) would be useful.
> Lookup is much faster if we cache the blocks in a radix tree.
While this is really easy and straightforward, is there any reason why
subsys_find_device_by_id has to use such a slow lookup? I suspect nobody
simply needed a more optimized data structure for that purpose yet.
Would it be too hard to use radix tree for all lookups rather than
adding a shadow copy for memblocks?
> Memory
> subsystem initialization and hotplug/hotunplug is at least a little faster
> for any machine with more than ~100 blocks, and the speedup grows with
> the block count.
If we are onlining one memblock after another and there are many of them
then this is going to help a some. Some numbers would be really nice.
If there are too many blocks then the biggest part of the overhead is
still there when registering each memblock though because that operation
is just very expensinve (e.g. udev notification).
> Signed-off-by: Scott Cheloha <cheloha@...ux.vnet.ibm.com>
> Acked-by: David Hildenbrand <david@...hat.com>
> ---
> v2 incorporates suggestions from David Hildenbrand.
>
> v3 changes:
> - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
>
> - Be conservative: don't use radix_tree_for_each_slot() in
> walk_memory_blocks() yet. It introduces RCU which could
> change behavior. Walking the tree "by hand" with
> find_memory_block_by_id() is slower but keeps the patch
> simple.
>
> drivers/base/memory.c | 36 +++++++++++++++++++++++-------------
> 1 file changed, 23 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 799b43191dea..8902930d5ef2 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -19,6 +19,7 @@
> #include <linux/memory.h>
> #include <linux/memory_hotplug.h>
> #include <linux/mm.h>
> +#include <linux/radix-tree.h>
> #include <linux/stat.h>
> #include <linux/slab.h>
>
> @@ -56,6 +57,13 @@ static struct bus_type memory_subsys = {
> .offline = memory_subsys_offline,
> };
>
> +/*
> + * Memory blocks are cached in a local radix tree to avoid
> + * a costly linear search for the corresponding device on
> + * the subsystem bus.
> + */
> +static RADIX_TREE(memory_blocks, GFP_KERNEL);
> +
> static BLOCKING_NOTIFIER_HEAD(memory_chain);
>
> int register_memory_notifier(struct notifier_block *nb)
> @@ -572,20 +580,14 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
> /* A reference for the returned memory block device is acquired. */
> static struct memory_block *find_memory_block_by_id(unsigned long block_id)
> {
> - struct device *dev;
> + struct memory_block *mem;
>
> - dev = subsys_find_device_by_id(&memory_subsys, block_id, NULL);
> - return dev ? to_memory_block(dev) : NULL;
> + mem = radix_tree_lookup(&memory_blocks, block_id);
> + if (mem)
> + get_device(&mem->dev);
> + return mem;
> }
>
> -/*
> - * For now, we have a linear search to go find the appropriate
> - * memory_block corresponding to a particular phys_index. If
> - * this gets to be a real problem, we can always use a radix
> - * tree or something here.
> - *
> - * This could be made generic for all device subsystems.
> - */
> struct memory_block *find_memory_block(struct mem_section *section)
> {
> unsigned long block_id = base_memory_block_id(__section_nr(section));
> @@ -628,9 +630,15 @@ int register_memory(struct memory_block *memory)
> memory->dev.offline = memory->state == MEM_OFFLINE;
>
> ret = device_register(&memory->dev);
> - if (ret)
> + if (ret) {
> put_device(&memory->dev);
> -
> + return ret;
> + }
> + ret = radix_tree_insert(&memory_blocks, memory->dev.id, memory);
> + if (ret) {
> + put_device(&memory->dev);
> + device_unregister(&memory->dev);
> + }
> return ret;
> }
>
> @@ -688,6 +696,8 @@ static void unregister_memory(struct memory_block *memory)
> if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
> return;
>
> + WARN_ON(radix_tree_delete(&memory_blocks, memory->dev.id) == NULL);
> +
> /* drop the ref. we got via find_memory_block() */
> put_device(&memory->dev);
> device_unregister(&memory->dev);
> --
> 2.24.0
>
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists