[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <455F58AC.3030801@lwfinger.net>
Date: Sat, 18 Nov 2006 13:02:04 -0600
From: Larry Finger <Larry.Finger@...inger.net>
To: Ray Lee <ray-lk@...rabbit.org>
CC: Johannes Berg <johannes@...solutions.net>,
Joseph Fannin <jhf@...umbus.rr.com>,
Andrew Morton <akpm@...l.org>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Michael Buesch <mb@...sch.de>,
Bcm43xx-dev@...ts.berlios.denunk, Adrian Bunk <bunk@...sta.de>
Subject: Re: bcm43xx regression 2.6.19rc3 -> rc5, rtnl_lock trouble?
Ray Lee wrote:
> Larry Finger wrote:
>> Johannes Berg wrote:
>>> Hah, that's a lot more plausible than bcm43xx's drain patch actually
>>> causing this. So maybe somehow interrupts for bcm43xx aren't routed
>>> properly or something...
>>>
>>> Ray, please check /proc/interrupts when this happens.
>
> When it happens, I can't. The keyboard is entirely dead (I'm in X, perhaps at
> a console it would be okay). The only thing that works is magic SysRq. even
> ctrl-alt-f1 to get to a console doesn't work.
>
> That said, /proc/interrupts doesn't show MSI routed things on my AMD64 laptop.
>
>>> I am convinced that the patch in question (drain tx status) is not
>>> causing this -- the patch should be a no-op in most cases anyway, and in
>>> those cases where it isn't a no-op it'll run only once at card init and
>>> remove some things from a hardware-internal FIFO.
>
> Okay, I can buy that.
>
>> I agree that drain tx status should not cause the problem.
>>
>> Ray, does -rc6 solve your problem as it did for Joseph?
>
> I can't get it to repeat other than the first two times. However, I
> accidentally stopped NetworkManager from handling my wireless a few days ago,
> and haven't restarted it, so that may play into this.
>
> Humor me one last time, I beg. Did you look at the messages file I posted? (Or
> maybe I didn't include this second bit... Damn, I need to be more careful with
> cutting and pasting...)
The locking stuff wasn't in any of the messages that I received.
> The second sysrq-t shows locking stuff going on, can you tell me if it looks
> reasonable? It still seems to me that something acquiring and not releasing
> rtnl_lock explains what I was seeing (rtnl lock is implicated in both sysrq-t
> backtraces). I don't know if that thing is bcm43xx, though.
>
> Is this part reasonable?:
> 1 lock held by events/0/4:
> #0: (&bcm->mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
> 2 locks held by NetworkManager/4837:
> #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
> #1: (&bcm->mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
> 1 lock held by wpa_supplicant/5953:
> #0: (rtnl_mutex){--..}, at: [mutex_lock+9/16] mutex_lock+0x9/0x10
I'm not an expert on locking, but it certainly looks as if bcm43xx and wpa_supplicant are OK by
themselves, but that NetworkManager interferes. This behavior matches what I see - I don't have
NetworkManager on my system, but I do use wpa_supplicant, with no lockups. Of course, I have i386
architecture.
Although NetworkManager may be the catalyst to trigger the bug, I doubt that it is the cause.
Strictly as a guess, I would suspect that the locking problem is in SoftMAC, where we know there can
be locking difficulties, but no one is fixing them because EOL is near for that component.
Larry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists