lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200110113137.GE29802@dhcp22.suse.cz>
Date:   Fri, 10 Jan 2020 12:31:37 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     David Hildenbrand <david@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Scott Cheloha <cheloha@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        nathanl@...ux.ibm.com, ricklind@...ux.vnet.ibm.com,
        Scott Cheloha <cheloha@...ux.ibm.com>
Subject: Re: [PATCH v4] drivers/base/memory.c: cache blocks in radix tree to
 accelerate lookup

On Fri 10-01-20 10:32:01, David Hildenbrand wrote:
> On 09.01.20 23:35, David Hildenbrand wrote:
> >> Am 09.01.2020 um 23:28 schrieb Andrew Morton <akpm@...ux-foundation.org>:
> >>
> >> On Thu, 9 Jan 2020 23:17:09 +0100 David Hildenbrand <david@...hat.com> wrote:
> >>
> >>>
> >>>
> >>>>> Am 09.01.2020 um 23:00 schrieb Andrew Morton <akpm@...ux-foundation.org>:
> >>>>
> >>>> On Thu,  9 Jan 2020 15:25:16 -0600 Scott Cheloha <cheloha@...ux.vnet.ibm.com> wrote:
> >>>>
> >>>>> Searching for a particular memory block by id is an O(n) operation
> >>>>> because each memory block's underlying device is kept in an unsorted
> >>>>> linked list on the subsystem bus.
> >>>>>
> >>>>> We can cut the lookup cost to O(log n) if we cache the memory blocks in
> >>>>> a radix tree.  With a radix tree cache in place both memory subsystem
> >>>>> initialization and memory hotplug run palpably faster on systems with a
> >>>>> large number of memory blocks.
> >>>>>
> >>>>> ...
> >>>>>
> >>>>> @@ -56,6 +57,13 @@ static struct bus_type memory_subsys = {
> >>>>>   .offline = memory_subsys_offline,
> >>>>> };
> >>>>>
> >>>>> +/*
> >>>>> + * Memory blocks are cached in a local radix tree to avoid
> >>>>> + * a costly linear search for the corresponding device on
> >>>>> + * the subsystem bus.
> >>>>> + */
> >>>>> +static RADIX_TREE(memory_blocks, GFP_KERNEL);
> >>>>
> >>>> What protects this tree from racy accesses?
> >>>
> >>> I think the device hotplug lock currently (except during boot where no races can happen).
> >>>
> >>
> >> So this?
> >>
> >> --- a/drivers/base/memory.c~drivers-base-memoryc-cache-blocks-in-radix-tree-to-accelerate-lookup-fix
> >> +++ a/drivers/base/memory.c
> >> @@ -61,6 +61,9 @@ static struct bus_type memory_subsys = {
> >>  * Memory blocks are cached in a local radix tree to avoid
> >>  * a costly linear search for the corresponding device on
> >>  * the subsystem bus.
> >> + *
> >> + * Protected by mem_hotplug_lock in mem_hotplug_begin(), and by the guaranteed
> >> + * single-threadness at boot time.
> >>  */
> >> static RADIX_TREE(memory_blocks, GFP_KERNEL);
> >>
> >>
> >> But are we sure this is all true?
> > 
> > I think the device hotplug lock, not the memory hotplug lock. Will double check later.
> 
> So all writers either hold the device_hotplug_lock or run during boot.
> Documented e.g., for memory_dev_init(), create_memory_block_devices(),
> remove_memory_block_devices().
> 
> The readers are mainly
> - find_memory_block()
>  -> called via online_pages()/offline_pages() where we hold the
>     device_hotplug_lock
>  -> called from arch/powerpc/platforms/pseries/hotplug-memory.c,
>     where we always hold the device_hotplug_lock
> - walk_memory_blocks()
>  -> Callers in drivers/acpi/acpi_memhotplug.c either hold the
>     device_hotplug_lock or are called early during boot
>  -> Callers in mm/memory_hotplug.c either hold the
>     device_hotplug_lock or are called early during boot
>  -> link_mem_sections() is called either early during boot or via
>     add_memory_resource() (whereby all callers either hold the
>     device_hotplug_lock or are called early during boot)
>  -> Callers in arch/powerpc/platforms/powernv/memtrace.c hold the
>     device_hotplug_lock
> 
> So we are fine.
> 
> I suggest we document that expected behavior via

Thanks for documenting this! Adding a comment as you suggest makes
sense. Overall the locking expectations would be similar to
subsys_find_device_by_id which doesn't use any internal locking so the
radix tree which follows the lifetime of those objects should be
compatible with the current implementation (so no new races at least).

> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 799b43191dea..8c8dc081597e 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -585,6 +585,8 @@ static struct memory_block
> *find_memory_block_by_id(unsigned long block_id)
>   * tree or something here.
>   *
>   * This could be made generic for all device subsystems.
> + *
> + * Called under device_hotplug_lock.
>   */
>  struct memory_block *find_memory_block(struct mem_section *section)
>  {
> @@ -837,6 +839,8 @@ void __init memory_dev_init(void)
>   *
>   * In case func() returns an error, walking is aborted and the error is
>   * returned.
> + *
> + * Called under device_hotplug_lock.
>   */
>  int walk_memory_blocks(unsigned long start, unsigned long size,
>                        void *arg, walk_memory_blocks_func_t func)
> 
> 
> Please note that the memory hotplug lock is not safe on the reader side.
> But also not on the writer side after
> https://lkml.kernel.org/r/157863061737.2230556.3959730620803366776.stgit@dwillia2-desk3.amr.corp.intel.com
> 
> 
> -- 
> Thanks,
> 
> David / dhildenb

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ