[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <867ixyvum6.fsf@gere.msconsult.dk>
Date: Tue, 14 Nov 2006 17:52:17 +0100
From: rasmus@...onsult.dk (Rasmus Bøg Hansen)
To: linux-kernel@...r.kernel.org
Subject: BUG: soft lockup detected on CPU#0! (2.6.18.2)
[1.] One line summary of the problem:
Kernel BUG's and freezes after a soft lockup.
[2.] Full description of the problem/report:
The night before sunday, my server froze. It was entirely dead and had
to be power cycled. There was no seriel console connected but it
managed to log a short BUG before, which seems related to smbfs.
As it happened in the night, I am unsure what triggered the bug, but
it was during the nightly backup routines, which includes running
rsync over ssh (over ADSL so pretty slow) and writing some large
.tar.bz2 to a smbfs drive. I assume (but do no know for sure) that it
was the last one that triggered the bug.
[3.] Keywords (i.e., modules, networking, kernel):
soft lockup, smbfs, SMP
[4.] Kernel version (from /proc/version):
Linux version 2.6.18.2 (root@...e) (gcc version 3.3.5 (Debian
1:3.3.5-13)) #1 SMP Wed Nov 8 10:00:34 CET 2006
[5.] Most recent kernel version which did not have the bug:
I never saw it before - it has been running 2.6.18.1 as well as 2.6.18
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)
Nov 12 03:54:57 gere kernel: BUG: soft lockup detected on CPU#0!
Nov 12 03:54:57 gere kernel: [softlockup_tick+170/195] softlockup_tick+0xaa/0xc3
Nov 12 03:54:57 gere kernel: [update_process_times+56/137] update_process_times+0x38/0x89
Nov 12 03:54:57 gere kernel: [smp_apic_timer_interrupt+105/117] smp_apic_timer_interrupt+0x69/0x75
Nov 12 03:54:57 gere kernel: [smbiod+238/348] smbiod+0xee/0x15c
Nov 12 03:54:57 gere kernel: [apic_timer_interrupt+31/36] apic_timer_interrupt+0x1f/0x24
Nov 12 03:54:57 gere kernel: [journal_init_revoke+49/678] journal_init_revoke+0x31/0x2a6
Nov 12 03:54:57 gere kernel: [smbiod+238/348] smbiod+0xee/0x15c
Nov 12 03:54:57 gere kernel: [__wake_up_common+63/94] __wake_up_common+0x3f/0x5e
Nov 12 03:54:57 gere kernel: [autoremove_wake_function+0/87] autoremove_wake_function+0x0/0x57
Nov 12 03:54:57 gere kernel: [autoremove_wake_function+0/87] autoremove_wake_function+0x0/0x57
Nov 12 03:54:57 gere kernel: [smbiod+0/348] smbiod+0x0/0x15c
Nov 12 03:54:57 gere kernel: [kthread+191/195] kthread+0xbf/0xc3
Nov 12 03:54:57 gere kernel: [kthread+0/195] kthread+0x0/0xc3
Nov 12 03:54:57 gere kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb
[7.] A small shell script or example program which triggers the
problem (if possible)
I am not entirely sure how to reproduce the bug in a reliable manner
as backup routines since (and before) have been running flawlessly.
[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
Debian stable with a few backports - output from ver_linux:
Linux gere 2.6.18.2 #1 SMP Wed Nov 8 10:00:34 CET 2006 i686 GNU/Linux
Gnu C 3.3.5
Gnu make 3.80
binutils 2.15
util-linux 2.12p
mount 2.12p
module-init-tools 3.2-pre1
e2fsprogs 1.37
nfs-utils 1.0.6
Linux C Library 2.3.2
Dynamic linker (ldd) 2.3.2
Procps 3.2.1
Net-tools 1.60
Kbd [tilvalg...]
Console-tools 0.2.3
Sh-utils 5.2.1
Modules Loaded nls_cp865 nls_iso8859_15 nfsd exportfs lockd nfs_acl sunrpc parport_pc lp parport autofs4 dm_mod eeprom lm85 hwmon_vid hwmon i2c_i801 i2c_core rtc
[8.2.] Processor information (from /proc/cpuinfo):
P4 2.8GHz, running HT.
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 3
cpu MHz : 2793.144
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid
bogomips : 5589.25
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 3
cpu MHz : 2793.144
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid
bogomips : 5586.19
[8.3.] Module information (from /proc/modules):
nls_cp865 9856 0 - Live 0xe08ae000
nls_iso8859_15 8832 0 - Live 0xe08aa000
nfsd 109672 2 - Live 0xe0932000
exportfs 9216 1 nfsd, Live 0xe08a6000
lockd 69128 2 nfsd, Live 0xe08d5000
nfs_acl 7552 1 nfsd, Live 0xe0873000
sunrpc 155580 3 nfsd,lockd,nfs_acl, Live 0xe08ea000
parport_pc 38212 1 - Live 0xe08b4000
lp 13988 0 - Live 0xe0850000
parport 38600 2 parport_pc,lp, Live 0xe088a000
autofs4 23812 1 - Live 0xe0883000
dm_mod 60696 0 - Live 0xe0896000
eeprom 10128 0 - Live 0xe086b000
lm85 37284 0 - Live 0xe0878000
hwmon_vid 7168 1 lm85, Live 0xe0868000
hwmon 6788 1 lm85, Live 0xe085d000
i2c_i801 11532 0 - Live 0xe0859000
i2c_core 22528 3 eeprom,lm85,i2c_i801, Live 0xe0861000
rtc 12052 0 - Live 0xe0855000
[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
# cat /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-1fe2ffff : System RAM
00100000-002c62ca : Kernel code
002c62cb-003c9703 : Kernel data
1fe30000-1fe4149f : ACPI Non-volatile Storage
1fe414a0-1ff2ffff : System RAM
1ff30000-1ff3ffff : ACPI Tables
1ff40000-1ffeffff : ACPI Non-volatile Storage
1fff0000-1fffffff : reserved
30000000-300fffff : PCI Bus #03
30000000-3001ffff : 0000:03:06.0
30100000-301003ff : 0000:00:1f.1
f8000000-fbffffff : 0000:00:00.0
fc900000-fc9fffff : PCI Bus #02
fc9e0000-fc9fffff : 0000:02:01.0
fca00000-feafffff : PCI Bus #03
fd000000-fdffffff : 0000:03:06.0
feafe000-feafefff : 0000:03:08.0
feafe000-feafefff : e100
feaff000-feafffff : 0000:03:06.0
febffc00-febfffff : 0000:00:1d.7
fecf0000-fecf0fff : reserved
fed20000-fed9ffff : reserved
# cat /proc/ioports
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0376-0376 : ide1
0378-037a : parport0
037b-037f : parport0
03c0-03df : vga+
03f6-03f6 : ide0
0400-047f : 0000:00:1f.0
0400-0403 : ACPI PM1a_EVT_BLK
0404-0405 : ACPI PM1a_CNT_BLK
0408-040b : ACPI PM_TMR
0420-0420 : ACPI PM2_CNT_BLK
0428-042f : ACPI GPE0_BLK
0500-053f : 0000:00:1f.0
0cf8-0cff : PCI conf1
a000-afff : PCI Bus #02
ac00-ac1f : 0000:02:01.0
b000-bfff : PCI Bus #03
b800-b8ff : 0000:03:06.0
bc00-bc3f : 0000:03:08.0
bc00-bc3f : e100
c800-c81f : 0000:00:1f.3
c800-c81f : i801_smbus
cc00-cc1f : 0000:00:1d.0
d000-d01f : 0000:00:1d.1
d400-d41f : 0000:00:1d.2
d800-d81f : 0000:00:1d.3
dc00-dc0f : 0000:00:1f.2
dc00-dc0f : libata
e000-e003 : 0000:00:1f.2
e000-e003 : libata
e400-e407 : 0000:00:1f.2
e400-e407 : libata
e800-e803 : 0000:00:1f.2
e800-e803 : libata
ec00-ec07 : 0000:00:1f.2
ec00-ec07 : libata
ffa0-ffaf : 0000:00:1f.1
ffa0-ffa7 : ide0
ffa8-ffaf : ide1
[8.5.] PCI information ('lspci -vvv' as root)
View attachment "lspci.txt" of type "text/plain" (8930 bytes)
[8.6.] SCSI information (from /proc/scsi/scsi)
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3250823AS Rev: 3.03
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST3250823AS Rev: 3.03
Type: Direct-Access ANSI SCSI revision: 05
[8.7.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):
The system runs from three ATA disks in RAID1 (one disk used as spare)
with some data on those disks too - the rest resides on the two SATA
disks (also RAID1).
The machine acts as a multipurpose server (mail, web, file server). It
has no particular high load and has never shown this behaviour
before.
The entire dmesg output (from kern.log) might be useful as well as my
.config:
View attachment "kern.log" of type "text/plain" (26252 bytes)
View attachment "config-2.6.18.2" of type "text/plain" (33471 bytes)
I will, of course, post useful information, if necessary.
Regards
/Rasmus
--
Rasmus Bøg Hansen
MSC Aps
Bøgesvinget 8
2740 Skovlunde
44 53 93 66
Powered by blists - more mailing lists