[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b6bb4a50910182227y1281b40bj3fcc082d32cf4496@mail.gmail.com>
Date: Mon, 19 Oct 2009 13:27:24 +0800
From: Xiaotian Feng <xtfeng@...il.com>
To: Tilman Schmidt <tilman@...p.cc>
Cc: LKML <linux-kernel@...r.kernel.org>,
isdn4linux <isdn4linux@...tserv.isdn4linux.de>,
Netdev <netdev@...r.kernel.org>
Subject: Re: possible circular locking dependency in ISDN PPP
---> isdn_net_get_locked_lp
--->lock &nd->queue_lock
--->lock &nd->queue->xmit_lock
.....................
---->unlock &nd->queue_lock
---> isdn_net_writebuf_skb (called with &nd->queue->xmit_lock locked)
---->isdn_net_inc_frame_cnt
---->isdn_net_device_busy
----> lock &nd->queue_lock
So there's a circular locking dependency.. Looking into isdn_net_get_locked_lp()
static __inline__ isdn_net_local * isdn_net_get_locked_lp(isdn_net_dev *nd)
{
unsigned long flags;
isdn_net_local *lp;
spin_lock_irqsave(&nd->queue_lock, flags);
lp = nd->queue; /* get lp on top of queue */
spin_lock(&nd->queue->xmit_lock);
while (isdn_net_lp_busy(nd->queue)) {
spin_unlock(&nd->queue->xmit_lock);
nd->queue = nd->queue->next;
if (nd->queue == lp) { /* not found -- should never happen */
lp = NULL;
goto errout;
}
spin_lock(&nd->queue->xmit_lock);
}
lp = nd->queue;
nd->queue = nd->queue->next;
local_bh_disable();
errout:
spin_unlock_irqrestore(&nd->queue_lock, flags);
return lp;
}
Why do we need to hold xmit_lock while using
isdn_net_lp_busy(nd->queue) ? Can following patch fix this bug?
---
diff --git a/drivers/isdn/i4l/isdn_net.h b/drivers/isdn/i4l/isdn_net.h
index 74032d0..7511f08 100644
--- a/drivers/isdn/i4l/isdn_net.h
+++ b/drivers/isdn/i4l/isdn_net.h
@@ -83,19 +83,19 @@ static __inline__ isdn_net_local *
isdn_net_get_locked_lp(isdn_net_dev *nd)
spin_lock_irqsave(&nd->queue_lock, flags);
lp = nd->queue; /* get lp on top of queue */
- spin_lock(&nd->queue->xmit_lock);
while (isdn_net_lp_busy(nd->queue)) {
- spin_unlock(&nd->queue->xmit_lock);
nd->queue = nd->queue->next;
if (nd->queue == lp) { /* not found -- should never happen */
lp = NULL;
goto errout;
}
- spin_lock(&nd->queue->xmit_lock);
}
lp = nd->queue;
nd->queue = nd->queue->next;
+ spin_unlock_irqrestore(&nd->queue_lock, flags);
+ spin_lock(&lp->xmit_lock);
local_bh_disable();
+ return lp;
errout:
spin_unlock_irqrestore(&nd->queue_lock, flags);
return lp;
On Mon, Oct 19, 2009 at 6:16 AM, Tilman Schmidt <tilman@...p.cc> wrote:
> A test of PPP over ISDN with ipppd, capidrv and the so far unmerged
> CAPI port of the Gigaset driver produced the following lockdep
> message:
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.32-rc4-testing #7
> -------------------------------------------------------
> ipppd/28379 is trying to acquire lock:
> (&netdev->queue_lock){......}, at: [<e62ad0fd>] isdn_net_device_busy+0x2c/0x74 [isdn]
>
> but task is already holding lock:
> (&netdev->local->xmit_lock){+.....}, at: [<e62aefc2>] isdn_net_write_super+0x3f/0x6e [isdn]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&netdev->local->xmit_lock){+.....}:
> [<c0157e9c>] __lock_acquire+0xa12/0xb99
> [<c01580ac>] lock_acquire+0x89/0xa0
> [<c0373249>] _spin_lock+0x1b/0x2a
> [<e62b9d1c>] isdn_ppp_xmit+0xf0/0x5b0 [isdn]
> [<e62b03f0>] isdn_net_start_xmit+0x4c6/0x66b [isdn]
> [<c0307e75>] dev_hard_start_xmit+0x251/0x2e4
> [<c0317bcc>] sch_direct_xmit+0x4f/0x122
> [<c030829c>] dev_queue_xmit+0x2ae/0x412
> [<c030d748>] neigh_resolve_output+0x1f2/0x23c
> [<c0329b9c>] ip_finish_output2+0x1b1/0x1db
> [<c0329c25>] ip_finish_output+0x5f/0x62
> [<c0329cb5>] ip_output+0x8d/0x92
> [<c03290c0>] ip_local_out+0x18/0x1b
> [<c032932c>] ip_push_pending_frames+0x269/0x2c1
> [<c033fd78>] raw_sendmsg+0x618/0x6b0
> [<c0347ac9>] inet_sendmsg+0x3b/0x48
> [<c02fa5b5>] __sock_sendmsg+0x45/0x4e
> [<c02fad4b>] sock_sendmsg+0xb8/0xce
> [<c02faea0>] sys_sendmsg+0x13f/0x192
> [<c02fbeb2>] sys_socketcall+0x157/0x18e
> [<c0102974>] sysenter_do_call+0x12/0x32
>
> -> #0 (&netdev->queue_lock){......}:
> [<c0157da9>] __lock_acquire+0x91f/0xb99
> [<c01580ac>] lock_acquire+0x89/0xa0
> [<c03732db>] _spin_lock_irqsave+0x24/0x34
> [<e62ad0fd>] isdn_net_device_busy+0x2c/0x74 [isdn]
> [<e62aeee3>] isdn_net_writebuf_skb+0x6e/0xc2 [isdn]
> [<e62aefd4>] isdn_net_write_super+0x51/0x6e [isdn]
> [<e62bc26f>] isdn_ppp_write+0x3a8/0x3bc [isdn]
> [<e62b785a>] isdn_write+0x1d9/0x1f9 [isdn]
> [<c01c42c5>] vfs_write+0x84/0xdf
> [<c01c43b9>] sys_write+0x3b/0x60
> [<c0102974>] sysenter_do_call+0x12/0x32
>
> other info that might help us debug this:
>
> 1 lock held by ipppd/28379:
> #0: (&netdev->local->xmit_lock){+.....}, at: [<e62aefc2>] isdn_net_write_super+0x3f/0x6e [isdn]
>
> stack backtrace:
> Pid: 28379, comm: ipppd Not tainted 2.6.32-rc4-testing #7
> Call Trace:
> [<c03710dc>] ? printk+0xf/0x13
> [<c015714d>] print_circular_bug+0x90/0x9c
> [<c0157da9>] __lock_acquire+0x91f/0xb99
> [<c01580ac>] lock_acquire+0x89/0xa0
> [<e62ad0fd>] ? isdn_net_device_busy+0x2c/0x74 [isdn]
> [<c03732db>] _spin_lock_irqsave+0x24/0x34
> [<e62ad0fd>] ? isdn_net_device_busy+0x2c/0x74 [isdn]
> [<e62ad0fd>] isdn_net_device_busy+0x2c/0x74 [isdn]
> [<e62aeee3>] isdn_net_writebuf_skb+0x6e/0xc2 [isdn]
> [<e62aefd4>] isdn_net_write_super+0x51/0x6e [isdn]
> [<e62bc26f>] isdn_ppp_write+0x3a8/0x3bc [isdn]
> [<e62b785a>] isdn_write+0x1d9/0x1f9 [isdn]
> [<c01c3b6c>] ? rw_verify_area+0x8a/0xad
> [<e62b7681>] ? isdn_write+0x0/0x1f9 [isdn]
> [<c01c42c5>] vfs_write+0x84/0xdf
> [<c01c43b9>] sys_write+0x3b/0x60
> [<c0102974>] sysenter_do_call+0x12/0x32
>
> The message appeared shortly after initiating the connection,
> during the PPP negotiation, just when the IP address was assigned.
> The system continued to run normally, and the connection was
> successfully established. Full log showing the entire connection
> (with capidrv and Gigaset driver debugging output enabled, 70 kB),
> available at http://www.phoenixsoftware.de/~ts/ppp-lockprob-full.log
> in case someone's interested. It shows the messages from ipppd
> about the IP address assignment arriving in the middle of the
> lockdep message.
>
> I cannot say whether this is a regression. My previous tests of
> that scenario were done on a machine with an Nvidia graphics card
> where the lockdep machinery would refuse to run because of the
> kernel being tainted by the Nvidia driver, so I wouldn't have seen
> anything one way or another.
>
> Btw, one of those "NOHZ: local_softirq_pending 08" messages is also
> present in the log, but that's 28 seconds later so I'd be surprised
> if the two were related.
>
> Any hints about the possible cause and seriousness of that
> message would be welcome. I'm particularly interested, of course,
> in finding out whether the Gigaset driver might somehow be causing
> it, even though it doesn't appear anywhere in the backtraces.
>
> aTdHvAaNnKcSe
> Tilman
>
> --
> Tilman Schmidt E-Mail: tilman@...p.cc
> Bonn, Germany
> Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
> Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists