[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86ac2jekn2.fsf@sif.msconsult.dk>
Date:	Wed, 22 Nov 2006 13:22:25 +0100
From:	rasmus@...onsult.dk (Rasmus Bøg Hansen)
To:	rasmus@...onsult.dk (Rasmus Bøg Hansen)
Cc:	Oleg Verych <olecom@...wer.upol.cz>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>
Subject: Re: smbfs (Re: BUG: soft lockup detected on CPU#0! (2.6.18.2))
rasmus@...onsult.dk (Rasmus Bøg Hansen) writes:
> Oleg Verych <olecom@...wer.upol.cz> writes:
>
>> [ Adding e-mail of Andrew Morton, he may have clue about who to ping ;]
>> [ MAINTAINERS.smbfs seems to be emply                                 ]
>>
>> On 2006-11-14, Rasmus BЬg Hansen wrote:
>> []
>>> [1.] One line summary of the problem:
>>>
>>> Kernel BUG's and freezes after a soft lockup.
>>>
>>> [2.] Full description of the problem/report:
>>>
>>> The night before sunday, my server froze. It was entirely dead and had
>>> to be power cycled. There was no seriel console connected but it
>>> managed to log a short BUG before, which seems related to smbfs.
>>>
>>> As it happened in the night, I am unsure what triggered the bug, but
>>> it was during the nightly backup routines, which includes running
>>> rsync over ssh (over ADSL so pretty slow) and writing some large
>>> .tar.bz2 to a smbfs drive. I assume (but do no know for sure) that it
>>> was the last one that triggered the bug.
>>
>> Nobody seems to picked this up. So.
>> Why don't you try debian's kernel 2.6.18 from unstable?
>
> I haven't tried that, partially to get rid of the initrd and have
> drivers for the controllers/disks in-kernel, partially because I like
> having built my own kernel. That might be silly, of course - I can try
> the Debian kernel, if you think it would make a difference.
>
>> I see, you've build it yourself, then try to enable some more locking
>> debuging in the "kernel hacking" section.
>
> I am now running with all lock debugging turned on. "Unfortunately"
> the machine has been running stable since the crash last weekend -
> perhaps the extra-large backup rourines at sunday may trigger it
> again...
>
>> (gitweb down, i can't check history of smbfs, and i have amd64 arch, anyway)
>>> Nov 12 03:54:57 gere kernel: BUG: soft lockup detected on CPU#0!
>>> Nov 12 03:54:57 gere kernel:  [softlockup_tick+170/195] softlockup_tick+0xaa/0xc3
>>> Nov 12 03:54:57 gere kernel:  [update_process_times+56/137] update_process_times+0x38/0x89
>>> Nov 12 03:54:57 gere kernel:  [smp_apic_timer_interrupt+105/117] smp_apic_timer_interrupt+0x69/0x75
>>> Nov 12 03:54:57 gere kernel:  [smbiod+238/348] smbiod+0xee/0x15c
>> this
>>
>>> Nov 12 03:54:57 gere kernel:  [apic_timer_interrupt+31/36] apic_timer_interrupt+0x1f/0x24
>>> Nov 12 03:54:57 gere kernel:  [journal_init_revoke+49/678] journal_init_revoke+0x31/0x2a6
>>> Nov 12 03:54:57 gere kernel:  [smbiod+238/348] smbiod+0xee/0x15c
>> and this *may be* double (un)lock.
>
> Hopefully lock debugging will tell.
I got this - I think it was this morning (somehow kernel logging was
disabled so I can't tell the exact time):
----
=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
nfsd/1788 is trying to acquire lock:
 (&inode->i_mutex){--..}, at: [<c02cc35c>] mutex_lock+0x8/0xa
but task is already holding lock:
 (&inode->i_mutex){--..}, at: [<c02cc35c>] mutex_lock+0x8/0xa
other info that might help us debug this:
2 locks held by nfsd/1788:
 #0:  (hash_sem){----}, at: [<e0930d99>] exp_readlock+0x12/0x16 [nfsd]
 #1:  (&inode->i_mutex){--..}, at: [<c02cc35c>] mutex_lock+0x8/0xa
stack backtrace:
 [<c0103c10>] show_trace+0x27/0x2b
 [<c0103d2d>] dump_stack+0x26/0x2a
 [<c01353ba>] print_deadlock_bug+0xb5/0xba
 [<c0135420>] check_deadlock+0x61/0x71
 [<c0136da3>] __lock_acquire+0x334/0x9b6
 [<c0137b3d>] lock_acquire+0x75/0x90
 [<c02cb9a1>] __mutex_lock_slowpath+0x85/0x270
 [<c02cc35c>] mutex_lock+0x8/0xa
 [<e092cb75>] nfsd_setattr+0x3ca/0x574 [nfsd]
 [<e092e3a8>] nfsd_create_v3+0x3a6/0x540 [nfsd]
 [<e0934d4b>] nfsd3_proc_create+0x118/0x161 [nfsd]
 [<e0929751>] nfsd_dispatch+0xd8/0x1ff [nfsd]
 [<e08e8503>] svc_process+0x4e5/0x6da [sunrpc]
 [<e09294e7>] nfsd+0x1cc/0x35e [nfsd]
 [<c01010b9>] kernel_thread_helper+0x5/0xb
----
Also it doesn't seem to be related to smbfs AFAICS - I am not enough
into lock debugging to say if this is relevant, useful or just
noise...
Regards
/Rasmus
-- 
Rasmus Bøg Hansen
MSC Aps
Bøgesvinget 8
2740 Skovlunde
44 53 93 66
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
