[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5abb31e1-b5f2-718d-3a48-b0d8a73d6e5c@gmx.us>
Date: Sun, 25 Nov 2018 23:52:24 -0500
From: Qian Cai <cai@....us>
To: Waiman Long <longman@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Yang Shi <yang.shi@...ux.alibaba.com>, arnd@...db.de,
linux kernel <linux-kernel@...r.kernel.org>,
Catalin Marinas <catalin.marinas@....com>
Subject: Re: [PATCH v4] debugobjects: scale the static pool size
On 11/25/18 8:31 PM, Waiman Long wrote:
> On 11/25/2018 03:42 PM, Qian Cai wrote:
>>
>>
>> On 11/23/18 10:01 PM, Qian Cai wrote:
>>>
>>>
>>>> On Nov 22, 2018, at 4:56 PM, Thomas Gleixner <tglx@...utronix.de>
>>>> wrote:
>>>>
>>>> On Tue, 20 Nov 2018, Qian Cai wrote:
>>>>
>>>> Looking deeper at that.
>>>>
>>>>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>>>>> index 70935ed91125..140571aa483c 100644
>>>>> --- a/lib/debugobjects.c
>>>>> +++ b/lib/debugobjects.c
>>>>> @@ -23,9 +23,81 @@
>>>>> #define ODEBUG_HASH_BITS 14
>>>>> #define ODEBUG_HASH_SIZE (1 << ODEBUG_HASH_BITS)
>>>>>
>>>>> -#define ODEBUG_POOL_SIZE 1024
>>>>> +#define ODEBUG_DEFAULT_POOL 512
>>>>> #define ODEBUG_POOL_MIN_LEVEL 256
>>>>>
>>>>> +/*
>>>>> + * Some debug objects are allocated during the early boot.
>>>>> Enabling some options
>>>>> + * like timers or workqueue objects may increase the size required
>>>>> significantly
>>>>> + * with large number of CPUs. For example (as today, 20 Nov. 2018),
>>>>> + *
>>>>> + * No. CPUs x 2 (worker pool) objects:
>>>>> + *
>>>>> + * start_kernel
>>>>> + * workqueue_init_early
>>>>> + * init_worker_pool
>>>>> + * init_timer_key
>>>>> + * debug_object_init
>>>>> + *
>>>>> + * No. CPUs objects (CONFIG_HIGH_RES_TIMERS):
>>>>> + *
>>>>> + * sched_init
>>>>> + * hrtick_rq_init
>>>>> + * hrtimer_init
>>>>> + *
>>>>> + * CONFIG_DEBUG_OBJECTS_WORK:
>>>>> + * No. CPUs x 6 (workqueue) objects:
>>>>> + *
>>>>> + * workqueue_init_early
>>>>> + * alloc_workqueue
>>>>> + * __alloc_workqueue_key
>>>>> + * alloc_and_link_pwqs
>>>>> + * init_pwq
>>>>> + *
>>>>> + * Also, plus No. CPUs objects:
>>>>> + *
>>>>> + * perf_event_init
>>>>> + * __init_srcu_struct
>>>>> + * init_srcu_struct_fields
>>>>> + * init_srcu_struct_nodes
>>>>> + * __init_work
>>>>
>>>> None of the things are actually used or required _BEFORE_
>>>> debug_objects_mem_init() is invoked.
>>>>
>>>> The reason why the call is at this place in start_kernel() is
>>>> historical. It's because back in the days when debugobjects were
>>>> added the
>>>> memory allocator was enabled way later than today. So we can just
>>>> move the
>>>> debug_objects_mem_init() call right before sched_init() I think.
>>>
>>> Well, now that kmemleak_init() seems complains that
>>> debug_objects_mem_init()
>>> is called before it.
>>>
>>> [ 0.078805] kmemleak: Cannot insert 0xc000000dff930000 into the
>>> object search tree (overlaps existing)
>>> [ 0.078860] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc3+ #3
>>> [ 0.078883] Call Trace:
>>> [ 0.078904] [c000000001c8fcd0] [c000000000c96b34]
>>> dump_stack+0xe8/0x164 (unreliable)
>>> [ 0.078935] [c000000001c8fd20] [c000000000486e84]
>>> create_object+0x344/0x380
>>> [ 0.078962] [c000000001c8fde0] [c000000000489544]
>>> early_alloc+0x108/0x1f8
>>> [ 0.078989] [c000000001c8fe20] [c00000000109738c]
>>> kmemleak_init+0x1d8/0x3d4
>>> [ 0.079016] [c000000001c8ff00] [c000000001054028]
>>> start_kernel+0x5c0/0x6f8
>>> [ 0.079043] [c000000001c8ff90] [c00000000000ae7c]
>>> start_here_common+0x1c/0x520
>>> [ 0.079070] kmemleak: Kernel memory leak detector disabled
>>> [ 0.079091] kmemleak: Object 0xc000000ffd587b68 (size 40):
>>> [ 0.079112] kmemleak: comm "swapper/0", pid 0, jiffies 4294937299
>>> [ 0.079135] kmemleak: min_count = -1
>>> [ 0.079153] kmemleak: count = 0
>>> [ 0.079170] kmemleak: flags = 0x5
>>> [ 0.079188] kmemleak: checksum = 0
>>> [ 0.079206] kmemleak: backtrace:
>>> [ 0.079227] __debug_object_init+0x688/0x700
>>> [ 0.079250] debug_object_activate+0x1e0/0x350
>>> [ 0.079272] __call_rcu+0x60/0x430
>>> [ 0.079292] put_object+0x60/0x80
>>> [ 0.079311] kmemleak_init+0x2cc/0x3d4
>>> [ 0.079331] start_kernel+0x5c0/0x6f8
>>> [ 0.079351] start_here_common+0x1c/0x520
>>> [ 0.079380] kmemleak: Early log backtrace:
>>> [ 0.079399] memblock_alloc_try_nid_raw+0x90/0xcc
>>> [ 0.079421] sparse_init_nid+0x144/0x51c
>>> [ 0.079440] sparse_init+0x1a0/0x238
>>> [ 0.079459] initmem_init+0x1d8/0x25c
>>> [ 0.079498] setup_arch+0x3e0/0x464
>>> [ 0.079517] start_kernel+0xa4/0x6f8
>>> [ 0.079536] start_here_common+0x1c/0x520
>>>
>>
>> So this is an chicken-egg problem. Debug objects need kmemleak_init()
>> first, so it can make use of kmemleak_ignore() for all debug objects
>> in order to avoid the overlapping like the above.
>>
>> while (obj_pool_free < debug_objects_pool_min_level) {
>>
>> new = kmem_cache_zalloc(obj_cache, gfp);
>> if (!new)
>> return;
>>
>> kmemleak_ignore(new);
>>
>> However, there seems no way to move kmemleak_init() together this
>> early in start_kernel() just before vmalloc_init() [1] because it
>> looks like it depends on things like workqueue
>> (schedule_work(&cleanup_work)) and rcu. Hence, it needs to be after
>> workqueue_init_early() and rcu_init()
>>
>> Given that, maybe the best outcome is to stick to the alternative
>> approach that works [1] rather messing up with the order of
>> debug_objects_mem_init() in start_kernel() which seems tricky. What do
>> you think?
>>
>> [1] https://goo.gl/18N78g
>> [2] https://goo.gl/My6ig6
>
> Could you move kmemleak_init() and debug_objects_mem_init() as far up as
> possible, like before the hrtimer_init() to at least make static count
> calculation as simple as possible?
>
Well, there is only 2 x NR_CPUS difference after moved both calls just after
rcu_init().
Before After
64-CPU: 1114 974
160-CPU: 2774 2429
256-CPU: 3853 4378
I suppose it is possible that the timers only need the scale factor 5 instead of
10. However, it needs to be retested for all the configurations to be sure, and
likely need to remove all irqs calls in kmemleak_init() and subroutines because
it is now called with irq disabled. Given the initdata will be freed anyway,
does it really worth doing?
BTW, calling debug_objects_mem_init() before kmemleak_init() actually could
trigger a loop on machines with 160+ CPUs until the pool is filled up,
debug_objects_pool_min_level += num_possible_cpus() * 4;
[1] while (obj_pool_free < debug_objects_pool_min_level)
kmemleak_init
kmemleak_ignore (from replaced static debug objects)
make_black_object
put_object
__call_rcu (kernel/rcu/tree.c)
debug_rcu_head_queue
debug_object_activate
debug_object_init
fill_pool
kmemleak_ignore (looping in [1])
make_black_object
...
I think until this is resolved, there is no way to move debug_objects_mem_init()
before kmemleak_init().
Powered by blists - more mailing lists