linux-kernel - [BUG] circular lock dependency in tip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0903181729360.31583@gandalf.stny.rr.com>
Date:	Wed, 18 Mar 2009 17:33:39 -0400 (EDT)
From:	Steven Rostedt <rostedt@...dmis.org>
To:	LKML <linux-kernel@...r.kernel.org>
cc:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Rusty Russell <rusty@...tcorp.com.au>
Subject: [BUG] circular lock dependency in tip


I hit this bug in latest tip:

bxrhel51 login: [ 2773.995862] 
[ 2773.995867] =======================================================
[ 2773.998502] [ INFO: possible circular locking dependency detected ]
[ 2773.998502] 2.6.29-rc8-tip #760
[ 2773.998502] -------------------------------------------------------
[ 2773.998502] grep/15590 is trying to acquire lock:
[ 2773.998502]  (&q->lock){-.-.-.}, at: [<ffffffff80246961>] 
__wake_up+0x31/0x75
[ 2773.998502] 
[ 2773.998502] but task is already holding lock:
[ 2773.998502]  (&rq->lock){-.-.-.}, at: [<ffffffff80eaa1b6>] 
schedule+0x147/0x770
[ 2773.998502] 
[ 2773.998502] which lock already depends on the new lock.
[ 2773.998502] 
[ 2773.998502] 
[ 2773.998502] the existing dependency chain (in reverse order) is:
[ 2773.998502] 
[ 2773.998502] -> #1 (&rq->lock){-.-.-.}:
[ 2773.998502]        [<ffffffff8028274e>] __lock_acquire+0xa43/0xc0a
[ 2773.998502]        [<ffffffff8028392a>] lock_acquire+0x11e/0x15e
[ 2773.998502]        [<ffffffff80eacd07>] _spin_lock+0x4b/0x94
[ 2773.998502]        [<ffffffff80246bf2>] task_rq_lock+0x5f/0xa7
[ 2773.998502]        [<ffffffff8024f97b>] try_to_wake_up+0x40/0x2c1
[ 2773.998502]        [<ffffffff8024fc1c>] default_wake_function+0x20/0x36
[ 2773.998502]        [<ffffffff80244d2a>] __wake_up_common+0x55/0x9b
[ 2773.998502]        [<ffffffff8024688e>] complete+0x47/0x71
[ 2773.998502]        [<ffffffff80270727>] kthreadd+0x165/0x196
[ 2773.998502]        [<ffffffff8020cd1a>] child_rip+0xa/0x20
[ 2773.998502]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 2773.998502] 
[ 2773.998502] -> #0 (&q->lock){-.-.-.}:
[ 2773.998502]        [<ffffffff80282630>] __lock_acquire+0x925/0xc0a
[ 2773.998502]        [<ffffffff8028392a>] lock_acquire+0x11e/0x15e
[ 2773.998502]        [<ffffffff80ead75a>] _spin_lock_irqsave+0x63/0xb1
[ 2773.998502]        [<ffffffff80246961>] __wake_up+0x31/0x75
[ 2773.998502]        [<ffffffff802d4076>] wakeup_kswapd+0x78/0x96
[ 2773.998502]        [<ffffffff802cb7b7>] 
__alloc_pages_internal+0x110/0x407
[ 2773.998502]        [<ffffffff802faf8d>] alloc_pages_node+0x47/0x63
[ 2773.998502]        [<ffffffff802fb0f3>] alloc_slab_page+0x3e/0x54
[ 2773.998502]        [<ffffffff802fb160>] new_slab+0x57/0x20a
[ 2773.998502]        [<ffffffff802fb944>] __slab_alloc+0x20c/0x394
[ 2773.998502]        [<ffffffff802fbb5b>] 
kmem_cache_alloc_node_notrace+0x8f/0x106
[ 2773.998502]        [<ffffffff8068323a>] 
alloc_cpumask_var_node+0x48/0xec
[ 2773.998502]        [<ffffffff80683302>] alloc_cpumask_var+0x24/0x3d
z[ 2773.998502]        [<ffffffff80eaa25a>] schedule+0x1eb/0x770
[ 2773.998502]        [<ffffffff80eaa8e8>] io_schedule+0x3e/0x65
[ 2773.998502]        [<ffffffff802c3c9f>] sync_page+0x52/0x6a
[ 2773.998502]        [<ffffffff80eaad61>] __wait_on_bit_lock+0x51/0xb0
[ 2773.998502]        [<ffffffff802c3bc1>] __lock_page+0x71/0x8c


Seems that the issue is that we have:

static void idle_balance(int this_cpu, struct rq *this_rq)
{
        struct sched_domain *sd;
        int pulled_task = 0;
        unsigned long next_balance = jiffies + HZ;
        cpumask_var_t tmpmask;

        if (!alloc_cpumask_var(&tmpmask, GFP_ATOMIC))
                return;
[...]


and inside the scheduler we have:

        if (unlikely(!rq->nr_running))
                idle_balance(cpu, rq);


Attached is the full log.

-- Steve



Download attachment "log2.gz" of type "APPLICATION/octet-stream" (53714 bytes)