lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 14 Nov 2006 17:52:17 +0100 From: rasmus@...onsult.dk (Rasmus Bøg Hansen) To: linux-kernel@...r.kernel.org Subject: BUG: soft lockup detected on CPU#0! (2.6.18.2) [1.] One line summary of the problem: Kernel BUG's and freezes after a soft lockup. [2.] Full description of the problem/report: The night before sunday, my server froze. It was entirely dead and had to be power cycled. There was no seriel console connected but it managed to log a short BUG before, which seems related to smbfs. As it happened in the night, I am unsure what triggered the bug, but it was during the nightly backup routines, which includes running rsync over ssh (over ADSL so pretty slow) and writing some large .tar.bz2 to a smbfs drive. I assume (but do no know for sure) that it was the last one that triggered the bug. [3.] Keywords (i.e., modules, networking, kernel): soft lockup, smbfs, SMP [4.] Kernel version (from /proc/version): Linux version 2.6.18.2 (root@...e) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 SMP Wed Nov 8 10:00:34 CET 2006 [5.] Most recent kernel version which did not have the bug: I never saw it before - it has been running 2.6.18.1 as well as 2.6.18 [6.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) Nov 12 03:54:57 gere kernel: BUG: soft lockup detected on CPU#0! Nov 12 03:54:57 gere kernel: [softlockup_tick+170/195] softlockup_tick+0xaa/0xc3 Nov 12 03:54:57 gere kernel: [update_process_times+56/137] update_process_times+0x38/0x89 Nov 12 03:54:57 gere kernel: [smp_apic_timer_interrupt+105/117] smp_apic_timer_interrupt+0x69/0x75 Nov 12 03:54:57 gere kernel: [smbiod+238/348] smbiod+0xee/0x15c Nov 12 03:54:57 gere kernel: [apic_timer_interrupt+31/36] apic_timer_interrupt+0x1f/0x24 Nov 12 03:54:57 gere kernel: [journal_init_revoke+49/678] journal_init_revoke+0x31/0x2a6 Nov 12 03:54:57 gere kernel: [smbiod+238/348] smbiod+0xee/0x15c Nov 12 03:54:57 gere kernel: [__wake_up_common+63/94] __wake_up_common+0x3f/0x5e Nov 12 03:54:57 gere kernel: [autoremove_wake_function+0/87] autoremove_wake_function+0x0/0x57 Nov 12 03:54:57 gere kernel: [autoremove_wake_function+0/87] autoremove_wake_function+0x0/0x57 Nov 12 03:54:57 gere kernel: [smbiod+0/348] smbiod+0x0/0x15c Nov 12 03:54:57 gere kernel: [kthread+191/195] kthread+0xbf/0xc3 Nov 12 03:54:57 gere kernel: [kthread+0/195] kthread+0x0/0xc3 Nov 12 03:54:57 gere kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb [7.] A small shell script or example program which triggers the problem (if possible) I am not entirely sure how to reproduce the bug in a reliable manner as backup routines since (and before) have been running flawlessly. [8.] Environment [8.1.] Software (add the output of the ver_linux script here) Debian stable with a few backports - output from ver_linux: Linux gere 2.6.18.2 #1 SMP Wed Nov 8 10:00:34 CET 2006 i686 GNU/Linux Gnu C 3.3.5 Gnu make 3.80 binutils 2.15 util-linux 2.12p mount 2.12p module-init-tools 3.2-pre1 e2fsprogs 1.37 nfs-utils 1.0.6 Linux C Library 2.3.2 Dynamic linker (ldd) 2.3.2 Procps 3.2.1 Net-tools 1.60 Kbd [tilvalg...] Console-tools 0.2.3 Sh-utils 5.2.1 Modules Loaded nls_cp865 nls_iso8859_15 nfsd exportfs lockd nfs_acl sunrpc parport_pc lp parport autofs4 dm_mod eeprom lm85 hwmon_vid hwmon i2c_i801 i2c_core rtc [8.2.] Processor information (from /proc/cpuinfo): P4 2.8GHz, running HT. processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 3 cpu MHz : 2793.144 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid bogomips : 5589.25 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 3 cpu MHz : 2793.144 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid bogomips : 5586.19 [8.3.] Module information (from /proc/modules): nls_cp865 9856 0 - Live 0xe08ae000 nls_iso8859_15 8832 0 - Live 0xe08aa000 nfsd 109672 2 - Live 0xe0932000 exportfs 9216 1 nfsd, Live 0xe08a6000 lockd 69128 2 nfsd, Live 0xe08d5000 nfs_acl 7552 1 nfsd, Live 0xe0873000 sunrpc 155580 3 nfsd,lockd,nfs_acl, Live 0xe08ea000 parport_pc 38212 1 - Live 0xe08b4000 lp 13988 0 - Live 0xe0850000 parport 38600 2 parport_pc,lp, Live 0xe088a000 autofs4 23812 1 - Live 0xe0883000 dm_mod 60696 0 - Live 0xe0896000 eeprom 10128 0 - Live 0xe086b000 lm85 37284 0 - Live 0xe0878000 hwmon_vid 7168 1 lm85, Live 0xe0868000 hwmon 6788 1 lm85, Live 0xe085d000 i2c_i801 11532 0 - Live 0xe0859000 i2c_core 22528 3 eeprom,lm85,i2c_i801, Live 0xe0861000 rtc 12052 0 - Live 0xe0855000 [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) # cat /proc/iomem 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000f0000-000fffff : System ROM 00100000-1fe2ffff : System RAM 00100000-002c62ca : Kernel code 002c62cb-003c9703 : Kernel data 1fe30000-1fe4149f : ACPI Non-volatile Storage 1fe414a0-1ff2ffff : System RAM 1ff30000-1ff3ffff : ACPI Tables 1ff40000-1ffeffff : ACPI Non-volatile Storage 1fff0000-1fffffff : reserved 30000000-300fffff : PCI Bus #03 30000000-3001ffff : 0000:03:06.0 30100000-301003ff : 0000:00:1f.1 f8000000-fbffffff : 0000:00:00.0 fc900000-fc9fffff : PCI Bus #02 fc9e0000-fc9fffff : 0000:02:01.0 fca00000-feafffff : PCI Bus #03 fd000000-fdffffff : 0000:03:06.0 feafe000-feafefff : 0000:03:08.0 feafe000-feafefff : e100 feaff000-feafffff : 0000:03:06.0 febffc00-febfffff : 0000:00:1d.7 fecf0000-fecf0fff : reserved fed20000-fed9ffff : reserved # cat /proc/ioports 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0376-0376 : ide1 0378-037a : parport0 037b-037f : parport0 03c0-03df : vga+ 03f6-03f6 : ide0 0400-047f : 0000:00:1f.0 0400-0403 : ACPI PM1a_EVT_BLK 0404-0405 : ACPI PM1a_CNT_BLK 0408-040b : ACPI PM_TMR 0420-0420 : ACPI PM2_CNT_BLK 0428-042f : ACPI GPE0_BLK 0500-053f : 0000:00:1f.0 0cf8-0cff : PCI conf1 a000-afff : PCI Bus #02 ac00-ac1f : 0000:02:01.0 b000-bfff : PCI Bus #03 b800-b8ff : 0000:03:06.0 bc00-bc3f : 0000:03:08.0 bc00-bc3f : e100 c800-c81f : 0000:00:1f.3 c800-c81f : i801_smbus cc00-cc1f : 0000:00:1d.0 d000-d01f : 0000:00:1d.1 d400-d41f : 0000:00:1d.2 d800-d81f : 0000:00:1d.3 dc00-dc0f : 0000:00:1f.2 dc00-dc0f : libata e000-e003 : 0000:00:1f.2 e000-e003 : libata e400-e407 : 0000:00:1f.2 e400-e407 : libata e800-e803 : 0000:00:1f.2 e800-e803 : libata ec00-ec07 : 0000:00:1f.2 ec00-ec07 : libata ffa0-ffaf : 0000:00:1f.1 ffa0-ffa7 : ide0 ffa8-ffaf : ide1 [8.5.] PCI information ('lspci -vvv' as root) View attachment "lspci.txt" of type "text/plain" (8930 bytes) [8.6.] SCSI information (from /proc/scsi/scsi) Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: ST3250823AS Rev: 3.03 Type: Direct-Access ANSI SCSI revision: 05 [8.7.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): The system runs from three ATA disks in RAID1 (one disk used as spare) with some data on those disks too - the rest resides on the two SATA disks (also RAID1). The machine acts as a multipurpose server (mail, web, file server). It has no particular high load and has never shown this behaviour before. The entire dmesg output (from kern.log) might be useful as well as my .config: View attachment "kern.log" of type "text/plain" (26252 bytes) View attachment "config-2.6.18.2" of type "text/plain" (33471 bytes) I will, of course, post useful information, if necessary. Regards /Rasmus -- Rasmus Bøg Hansen MSC Aps Bøgesvinget 8 2740 Skovlunde 44 53 93 66
Powered by blists - more mailing lists