[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200611191701.21593.mb@bu3sch.de>
Date: Sun, 19 Nov 2006 17:01:21 +0100
From: Michael Buesch <mb@...sch.de>
To: Larry Finger <Larry.Finger@...inger.net>
Cc: Johannes Berg <johannes@...solutions.net>,
Joseph Fannin <jhf@...umbus.rr.com>,
Andrew Morton <akpm@...l.org>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Michael Buesch <mb@...sch.de>,
Bcm43xx-dev@...ts.berlios.denunk, Adrian Bunk <bunk@...sta.de>,
Ray Lee <ray-lk@...rabbit.org>
Subject: Re: bcm43xx regression 2.6.19rc3 -> rc5, rtnl_lock trouble?
On Saturday 18 November 2006 20:02, Larry Finger wrote:
> Ray Lee wrote:
> > Larry Finger wrote:
> >> Johannes Berg wrote:
> >>> Hah, that's a lot more plausible than bcm43xx's drain patch actually
> >>> causing this. So maybe somehow interrupts for bcm43xx aren't routed
> >>> properly or something...
> >>>
> >>> Ray, please check /proc/interrupts when this happens.
> >
> > When it happens, I can't. The keyboard is entirely dead (I'm in X, perhaps at
> > a console it would be okay). The only thing that works is magic SysRq. even
> > ctrl-alt-f1 to get to a console doesn't work.
> >
> > That said, /proc/interrupts doesn't show MSI routed things on my AMD64 laptop.
> >
> >>> I am convinced that the patch in question (drain tx status) is not
> >>> causing this -- the patch should be a no-op in most cases anyway, and in
> >>> those cases where it isn't a no-op it'll run only once at card init and
> >>> remove some things from a hardware-internal FIFO.
> >
> > Okay, I can buy that.
> >
> >> I agree that drain tx status should not cause the problem.
> >>
> >> Ray, does -rc6 solve your problem as it did for Joseph?
> >
> > I can't get it to repeat other than the first two times. However, I
> > accidentally stopped NetworkManager from handling my wireless a few days ago,
> > and haven't restarted it, so that may play into this.
> >
> > Humor me one last time, I beg. Did you look at the messages file I posted? (Or
> > maybe I didn't include this second bit... Damn, I need to be more careful with
> > cutting and pasting...)
>
> The locking stuff wasn't in any of the messages that I received.
>
> > The second sysrq-t shows locking stuff going on, can you tell me if it looks
> > reasonable? It still seems to me that something acquiring and not releasing
> > rtnl_lock explains what I was seeing (rtnl lock is implicated in both sysrq-t
> > backtraces). I don't know if that thing is bcm43xx, though.
> >
> > Is this part reasonable?:
> > 1 lock held by events/0/4:
> > #0: (&bcm->mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
> > 2 locks held by NetworkManager/4837:
> > #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
> > #1: (&bcm->mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
> > 1 lock held by wpa_supplicant/5953:
> > #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
>
> I'm not an expert on locking, but it certainly looks as if bcm43xx and wpa_supplicant are OK by
> themselves, but that NetworkManager interferes. This behavior matches what I see - I don't have
> NetworkManager on my system, but I do use wpa_supplicant, with no lockups. Of course, I have i386
> architecture.
>
> Although NetworkManager may be the catalyst to trigger the bug, I doubt that it is the cause.
> Strictly as a guess, I would suspect that the locking problem is in SoftMAC, where we know there can
> be locking difficulties, but no one is fixing them because EOL is near for that component.
Well, this is different. The (known) locking problems are of type "missing locking".
But this seems to be a deadlock or something, so clearly a bug that
must be fixed. Even in softmac.
I don't know yet how this can happen, though.
--
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists