lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d70f2ed0-8e4e-1548-a963-df6e06597223@redhat.com>
Date:   Fri, 20 Dec 2019 11:50:54 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Scott Cheloha <cheloha@...ux.vnet.ibm.com>
Cc:     linux-kernel@...r.kernel.org,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Oscar Salvador <osalvador@...e.de>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to
 accelerate lookup

On 19.12.19 18:33, Scott Cheloha wrote:
> On Wed, Dec 18, 2019 at 10:00:49AM +0100, David Hildenbrand wrote:
>> On 17.12.19 20:32, Scott Cheloha wrote:
>>> Searching for a particular memory block by id is slow because each block
>>> device is kept in an unsorted linked list on the subsystem bus.
>>>
>>> Lookup is much faster if we cache the blocks in a radix tree.  Memory
>>> subsystem initialization and hotplug/hotunplug is at least a little faster
>>> for any machine with more than ~100 blocks, and the speedup grows with
>>> the block count.
>>>
>>> Signed-off-by: Scott Cheloha <cheloha@...ux.vnet.ibm.com>
>>> Acked-by: David Hildenbrand <david@...hat.com>
>>> ---
>>> v2 incorporates suggestions from David Hildenbrand.
>>>
>>> v3 changes:
>>>   - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
>>>
>>>   - Be conservative: don't use radix_tree_for_each_slot() in
>>>     walk_memory_blocks() yet.  It introduces RCU which could
>>>     change behavior.  Walking the tree "by hand" with
>>>     find_memory_block_by_id() is slower but keeps the patch
>>>     simple.
>>
>> Fine with me (splitting it out, e.g., into an addon patch), however, as
>> readers/writers run mutually exclusive, there is nothing to worry about
>> here. RCU will not make a difference.
> 
> Oh.  In that case, can you make heads or tails of this CI failure
> email I got about the v2 patch?  I've inlined it at the end of this
> mail.  "Suspicious RCU usage".  Unclear if it's a false positive.  My
> thinking was that I'd hold off on using radix_tree_for_each_slot() and
> introducing a possible regression until I understood what was
> triggering the robot.

Ah, did not see that message. See below.

> 
> Also, is there anyone else I should shop this patch to?  I copied the
> maintainers reported by scripts/get_maintainer.pl but are there others
> who might be interested?

On memory hotplug (and related) things, it's usually a good idea to CC
Andrew (who picks up basically all MM stuff), Michal Hocko, and Oscar
Salvador. (cc-ing them)

> 
> --
> 
> Here's that CI mail.  I've stripped the attached config because it's
> huge.
> 
> Date: Sun, 1 Dec 2019 23:05:23 +0800
> From: kernel test robot <lkp@...el.com>
> To: Scott Cheloha <cheloha@...ux.vnet.ibm.com>
> Cc: 0day robot <lkp@...el.com>, LKP <lkp@...ts.01.org>
> Subject: 86dc301f7b ("drivers/base/memory.c: cache blocks in radix tree .."):
>  [    1.341517] WARNING: suspicious RCU usage
> Message-ID: <20191201150523.GE18573@...o2-debian>
> 
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://github.com/0day-ci/linux/commits/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557
> 
> commit 86dc301f7b4815d90e3a7843ffed655d64efe445
> Author:     Scott Cheloha <cheloha@...ux.vnet.ibm.com>
> AuthorDate: Thu Nov 21 13:59:52 2019 -0600
> Commit:     0day robot <lkp@...el.com>
> CommitDate: Sun Nov 24 10:45:59 2019 +0800
> 
>     drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
>     
>     Searching for a particular memory block by id is slow because each block
>     device is kept in an unsorted linked list on the subsystem bus.
>     
>     Lookup is much faster if we cache the blocks in a radix tree.  Memory
>     subsystem initialization and hotplug/hotunplug is at least a little faster
>     for any machine with more than ~100 blocks, and the speedup grows with
>     the block count.
>     
>     Signed-off-by: Scott Cheloha <cheloha@...ux.vnet.ibm.com>
>     Acked-by: David Hildenbrand <david@...hat.com>
> 
> 0e4a459f56  tracing: Remove unnecessary DEBUG_FS dependency
> 86dc301f7b  drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
> +---------------------------------------------------------------------+------------+------------+
> |                                                                     | 0e4a459f56 | 86dc301f7b |
> +---------------------------------------------------------------------+------------+------------+
> | boot_successes                                                      | 8          | 0          |
> | boot_failures                                                       | 25         | 11         |
> | WARNING:possible_circular_locking_dependency_detected               | 25         | 8          |
> | WARNING:suspicious_RCU_usage                                        | 0          | 11         |
> | include/linux/radix-tree.h:#suspicious_rcu_dereference_check()usage | 0          | 11         |
> +---------------------------------------------------------------------+------------+------------+
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <lkp@...el.com>
> 
> [    1.335279] random: get_random_bytes called from kcmp_cookies_init+0x29/0x4c with crng_init=0
> [    1.336883] ACPI: bus type PCI registered
> [    1.338295] PCI: Using configuration type 1 for base access
> [    1.340735] 
> [    1.341049] =============================
> [    1.341517] WARNING: suspicious RCU usage
> [    1.342266] 5.4.0-rc5-00070-g86dc301f7b481 #1 Tainted: G                T
> [    1.343494] -----------------------------
> [    1.344226] include/linux/radix-tree.h:167 suspicious rcu_dereference_check() usage!
> [    1.345516] 
> [    1.345516] other info that might help us debug this:
> [    1.345516] 
> [    1.346962] 
> [    1.346962] rcu_scheduler_active = 2, debug_locks = 1
> [    1.348134] no locks held by swapper/0/1.
> [    1.348866] 
> [    1.348866] stack backtrace:
> [    1.349525] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                T 5.4.0-rc5-00070-g86dc301f7b481 #1
> [    1.351230] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [    1.352720] Call Trace:
> [    1.353187]  ? dump_stack+0x9a/0xde
> [    1.353507]  ? node_access_release+0x19/0x19
> [    1.353507]  ? walk_memory_blocks+0xe6/0x184
> [    1.353507]  ? set_debug_rodata+0x20/0x20
> [    1.353507]  ? link_mem_sections+0x39/0x3d
> [    1.353507]  ? topology_init+0x74/0xc8
> [    1.353507]  ? enable_cpu0_hotplug+0x15/0x15
> [    1.353507]  ? do_one_initcall+0x13d/0x30a
> [    1.353507]  ? kernel_init_freeable+0x18e/0x23b
> [    1.353507]  ? rest_init+0x173/0x173
> [    1.353507]  ? kernel_init+0x10/0x151
> [    1.353507]  ? rest_init+0x173/0x173
> [    1.353507]  ? ret_from_fork+0x3a/0x50
> [    1.410829] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
> [    1.427389] cryptd: max_cpu_qlen set to 1000
> [    1.457792] ACPI: Added _OSI(Module Device)
> [    1.458615] ACPI: Added _OSI(Processor Device)
> [    1.459428] ACPI: Added _OSI(3.0 _SCP Extensions)
> 


We get a complaint that we do a rcu_dereference() without proper protection.

We come via

rest_init()->kernel_init()->kernel_init_freeable()->do_basic_setup()->
do_initcalls()->do_initcall_level()->do_one_initcall()->
topology_init()->register_one_node()->link_mem_sections()->
walk_memory_blocks()

E.g., we add ACPI memory similarly via
    ...do_initcalls()...acpi_init()->acpi_scan_init()
also without holding the device hotplug lock. AFAIK, we can't get
races/concurrent hot(un)plug activity here (we even documented that
for ACPI). And if we would, the code would already be wrong ;)

I assume the simplest and cheapest way to suppress the warning would be
to add rcu_read_lock()/rcu_read_unlock() around the
radix_tree_for_each_slot().

Another way to suppress the warning would be to take the device hotplug
lock before calling register_one_node() in all arch specific code - but
we had a similar discussion due to the ACPI code back then and decided
to not take the device hotplug lock if not really necessary.

... but after all we would want a radix_tree_for_each_slot() variant
that does not perform such checks here. We don't hold any locks and
don't need any locks.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ