lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Mar 2020 15:32:52 -0400
From:   Qian Cai <cai@....pw>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <john.stultz@...aro.org>, sboyd@...nel.org,
        rostedt@...dmis.org, hannes@...xchg.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] timer: silenct a lockdep splat with debugobjects



> On Mar 13, 2020, at 2:08 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Fri, Mar 13, 2020 at 11:42:21AM -0400, Qian Cai wrote:
>> psi_enqueue() calls add_timer() with pi->lock and rq->lock held which
>> in-turn could allcate with debugobjcets in the locking order,
>> 
>> pi->lock
>>  rq->lock
>>    base->lock
>>      batched_entropy_u32.lock
>> 
>> while the random code could always call into the scheduler via
>> try_to_wake_up() in the locking order,
>> 
>> batched_entropy_u32.lock
>>  pi->lock
>> 
>> Thus, it could generate a lockdep splat below right after boot. Ideally,
>> psi_enqueue() might be able to be called without either pi->lock or
>> rq->lock held, but it is tricky to do.
>> 
>> Since,
>> 
>> 1) debugobjects is only used in a debug kernel.
>> 2) the chance to trigger a real deadlock is relative low.
>> 3) once the splat happened, it will disable lockdep to prevent it from
>>   catching any more important issues later.
>> 
>> just silent the splat by temporarily lettting lockdep ignore lockes
>> inside debug_timer_activate() which sounds like a reasonable tradeoff
>> for debug kernels.
> 
> 
>> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
>> index 4820823515e9..27bfb8376d71 100644
>> --- a/kernel/time/timer.c
>> +++ b/kernel/time/timer.c
>> @@ -1036,7 +1036,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
>> 		}
>> 	}
>> 
>> +	/*
>> +	 * It will allocate under rq->lock and trigger a lockdep slat with
>> +	 * random code. Don't disable lockdep with debugobjects.
>> +	 */
>> +	lockdep_off();
>> 	debug_timer_activate(timer);
>> +	lockdep_on();
>> 
>> 	timer->expires = expires;
>> 	/*
> 
> You have to be f'ing kidding me. You've just earned yourself a lifetime
> membership of 'the tinker crew'.
> 
>> 00: [  321.355501] -> #3 (batched_entropy_u32.lock){-.-.}:
>> 00: [  321.355523]        lock_acquire+0x212/0x460
>> 00: [  321.355536]        _raw_spin_lock_irqsave+0xc4/0xe0
>> 00: [  321.355551]        get_random_u32+0x5a/0x138
>> 00: [  321.355564]        new_slab+0x188/0x760
>> 00: [  321.355576]        ___slab_alloc+0x5d2/0x928
>> 00: [  321.355589]        __slab_alloc+0x52/0x88
>> 00: [  321.355801]        kmem_cache_alloc+0x34a/0x558
>> 00: [  321.355819]        fill_pool+0x29e/0x490
>> 00: [  321.355835]        __debug_object_init+0xa0/0x828
>> 00: [  321.355848]        debug_object_activate+0x200/0x368
>> 00: [  321.355864]        add_timer+0x242/0x538
>> 00: [  321.355877]        queue_delayed_work_on+0x13e/0x148
>> 00: [  321.355893]        init_mm_internals+0x4c6/0x550
>> 00: [  321.355905]        kernel_init_freeable+0x224/0x590
>> 00: [  321.355921]        kernel_init+0x22/0x188
>> 00: [  321.355933]        ret_from_fork+0x30/0x34
> 
> Did you actually look at debug_object_activate() and read?
> 
> The only reason that is calling into __debug_object_init() is because it
> hadn't been initialized yet when it got activated. That *immediately*
> should've been a clue.
> 
> You can initialize this stuff early. For instance:
> 
>  INIT_DELAYED_WORK()
>    __INIT_DELAYED_WORK()
>      __init_timer()
>        init_timer_key()
> 	  debug_init()
> 	    debug_timer_init()
> 	      debug_object_init()
> 	        __debug_object_init()
> 
> And we're right at where the above callchain goes wrong.
> 
> Now, it actually looks like kernel/sched/psi.c actually initializes all
> delayed works it uses. This then leaves other random delayed works to
> establish the base->lock <- entropy.lock relation.
> 
> This just means we need to find and kill all such delayed_work users
> that fail to properly initialize their data structure.

Well, in mm/vmstat.c (init_mm_internals) case for example, it was initialized
as a static,

static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd);

which will not call __debug_object_init(). 



Powered by blists - more mailing lists