linux-kernel - Re: [ANNOUNCE] 3.14-rt1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1398696191.14475.14.camel@marge.simpson.net>
Date:	Mon, 28 Apr 2014 16:43:11 +0200
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Nicholas Mc Guire <der.herr@...r.at>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	John Kacur <jkacur@...hat.com>
Subject: Re: [ANNOUNCE] 3.14-rt1

On Mon, 2014-04-28 at 16:37 +0200, Mike Galbraith wrote: 
> On Mon, 2014-04-28 at 10:18 -0400, Steven Rostedt wrote: 
> > On Mon, 28 Apr 2014 11:09:46 +0200
> > Mike Galbraith <umgwanakikbuti@...il.com> wrote:
> >  
> > > migrate_disable-pushd-down-in-atomic_dec_and_spin_lo.patch
> > > 
> > > bug: migrate_disable() after blocking is too late.
> > > 
> > > @@ -1028,12 +1028,12 @@ int atomic_dec_and_spin_lock(atomic_t *a
> > >         /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
> > >         if (atomic_add_unless(atomic, -1, 1))
> > >                 return 0;
> > > -       migrate_disable();
> > >         rt_spin_lock(lock);
> > > -       if (atomic_dec_and_test(atomic))
> > > +       if (atomic_dec_and_test(atomic)){
> > > +               migrate_disable();
> > 
> > Makes sense, as the CPU can go offline right after the lock is grabbed
> > and before the migrate_disable() is called.
> > 
> > Seems that migrate_disable() must be called before taking the lock as
> > it is done in every other location.
> 
> And for tasklist_lock, seems you also MUST do that prior to trylock as
> well, else you'll run afoul of the hotplug beast.

This lockdep gripe is from the deadlocked crashdump with only the
clearly busted bits patched up.

[  193.033224] ======================================================
[  193.033225] [ INFO: possible circular locking dependency detected ]
[  193.033227] 3.12.18-rt25 #19 Not tainted
[  193.033227] -------------------------------------------------------
[  193.033228] boot.kdump/5422 is trying to acquire lock:
[  193.033237]  (&hp->lock){+.+...}, at: [<ffffffff81044974>] pin_current_cpu+0x84/0x1d0
[  193.033238] 
               but task is already holding lock:
[  193.033241]  (tasklist_lock){+.+...}, at: [<ffffffff81046a5b>] do_wait+0xbb/0x2a0
[  193.033242] 
               which lock already depends on the new lock.
               
[  193.033242] 
               the existing dependency chain (in reverse order) is:
[  193.033244] 
               -> #1 (tasklist_lock){+.+...}:
[  193.033248]        [<ffffffff810ae4a8>] check_prevs_add+0xf8/0x180
[  193.033250]        [<ffffffff810aeada>] validate_chain.isra.45+0x5aa/0x750
[  193.033252]        [<ffffffff810af4f6>] __lock_acquire+0x3f6/0x9f0
[  193.033253]        [<ffffffff810b01bc>] lock_acquire+0x8c/0x160
[  193.033257]        [<ffffffff8155e99c>] rt_write_lock+0x2c/0x40
[  193.033260]        [<ffffffff81548169>] _cpu_down+0x219/0x440
[  193.033261]        [<ffffffff815483c0>] cpu_down+0x30/0x50
[  193.033264]        [<ffffffff813711dc>] cpu_subsys_offline+0x1c/0x30
[  193.033267]        [<ffffffff8136c2d5>] device_offline+0x95/0xc0
[  193.033269]        [<ffffffff8136c3e0>] online_store+0x40/0x80
[  193.033271]        [<ffffffff81369813>] dev_attr_store+0x13/0x30
[  193.033274]        [<ffffffff811c8820>] sysfs_write_file+0xf0/0x170
[  193.033277]        [<ffffffff8115a068>] vfs_write+0xc8/0x1d0
[  193.033279]        [<ffffffff8115a500>] SyS_write+0x50/0xa0
[  193.033282]        [<ffffffff81566ca2>] system_call_fastpath+0x16/0x1b
[  193.033284] 
               -> #0 (&hp->lock){+.+...}:
[  193.033286]        [<ffffffff810ae39d>] check_prev_add+0x7bd/0x7d0
[  193.033287]        [<ffffffff810ae4a8>] check_prevs_add+0xf8/0x180
[  193.033289]        [<ffffffff810aeada>] validate_chain.isra.45+0x5aa/0x750
[  193.033291]        [<ffffffff810af4f6>] __lock_acquire+0x3f6/0x9f0
[  193.033293]        [<ffffffff810b01bc>] lock_acquire+0x8c/0x160
[  193.033295]        [<ffffffff8155e6a5>] rt_spin_lock+0x55/0x70
[  193.033296]        [<ffffffff81044974>] pin_current_cpu+0x84/0x1d0
[  193.033299]        [<ffffffff81079ef1>] migrate_disable+0x81/0x100
[  193.033301]        [<ffffffff8155e947>] rt_read_lock+0x47/0x60
[  193.033303]        [<ffffffff81046a5b>] do_wait+0xbb/0x2a0
[  193.033305]        [<ffffffff8104777e>] SyS_wait4+0x9e/0x100
[  193.033307]        [<ffffffff81566ca2>] system_call_fastpath+0x16/0x1b
[  193.033307] 
               other info that might help us debug this:
               
[  193.033308]  Possible unsafe locking scenario:
               
[  193.033309]        CPU0                    CPU1
[  193.033309]        ----                    ----
[  193.033310]   lock(tasklist_lock);
[  193.033312]                                lock(&hp->lock);
[  193.033313]                                lock(tasklist_lock);
[  193.033314]   lock(&hp->lock);
[  193.033315] 
                *** DEADLOCK ***
               
[  193.033316] 1 lock held by boot.kdump/5422:
[  193.033319]  #0:  (tasklist_lock){+.+...}, at: [<ffffffff81046a5b>] do_wait+0xbb/0x2a0
[  193.033320] 
               stack backtrace:
[  193.033322] CPU: 0 PID: 5422 Comm: boot.kdump Not tainted 3.12.18-rt25 #19
[  193.033323] Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007
[  193.033326]  ffff880200550818 ffff8802004e5ad8 ffffffff8155538c 0000000000000000
[  193.033328]  0000000000000000 ffff8802004e5b28 ffffffff8154d0df ffff8802004e5b18
[  193.033330]  ffff8802004e5b50 ffff880200550818 ffff8802005507e0 ffff880200550818
[  193.033331] Call Trace:
[  193.033335]  [<ffffffff8155538c>] dump_stack+0x4f/0x91
[  193.033337]  [<ffffffff8154d0df>] print_circular_bug+0xd3/0xe4
[  193.033339]  [<ffffffff810ae39d>] check_prev_add+0x7bd/0x7d0
[  193.033342]  [<ffffffff8107e1f5>] ? sched_clock_local+0x25/0x90
[  193.033344]  [<ffffffff8107e388>] ? sched_clock_cpu+0xa8/0x120
[  193.033346]  [<ffffffff810ae4a8>] check_prevs_add+0xf8/0x180
[  193.033348]  [<ffffffff810aeada>] validate_chain.isra.45+0x5aa/0x750
[  193.033350]  [<ffffffff810af4f6>] __lock_acquire+0x3f6/0x9f0
[  193.033352]  [<ffffffff8155da11>] ? rt_spin_lock_slowlock+0x231/0x280
[  193.033354]  [<ffffffff8155d911>] ? rt_spin_lock_slowlock+0x131/0x280
[  193.033356]  [<ffffffff81044974>] ? pin_current_cpu+0x84/0x1d0
[  193.033358]  [<ffffffff810b01bc>] lock_acquire+0x8c/0x160
[  193.033360]  [<ffffffff81044974>] ? pin_current_cpu+0x84/0x1d0
[  193.033362]  [<ffffffff8155e6a5>] rt_spin_lock+0x55/0x70
[  193.033363]  [<ffffffff81044974>] ? pin_current_cpu+0x84/0x1d0
[  193.033365]  [<ffffffff81044974>] pin_current_cpu+0x84/0x1d0
[  193.033367]  [<ffffffff81079ef1>] migrate_disable+0x81/0x100
[  193.033369]  [<ffffffff8155e947>] rt_read_lock+0x47/0x60
[  193.033371]  [<ffffffff81046a5b>] ? do_wait+0xbb/0x2a0
[  193.033373]  [<ffffffff8155cd39>] ? schedule+0x29/0x90
[  193.033374]  [<ffffffff81046a5b>] do_wait+0xbb/0x2a0
[  193.033378]  [<ffffffff8112ded6>] ? might_fault+0x56/0xb0
[  193.033380]  [<ffffffff8104777e>] SyS_wait4+0x9e/0x100
[  193.033382]  [<ffffffff81566cc7>] ? sysret_check+0x1b/0x56
[  193.033384]  [<ffffffff81045d50>] ? task_stopped_code+0xa0/0xa0
[  193.033386]  [<ffffffff81566ca2>] system_call_fastpath+0x16/0x1b
[  193.033845] SMP alternatives: lockdep: fixing up alternatives


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/