[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0908191149470.12663@gandalf.stny.rr.com>
Date: Wed, 19 Aug 2009 11:50:37 -0400 (EDT)
From: Steven Rostedt <rostedt@...dmis.org>
To: LKML <linux-kernel@...r.kernel.org>
cc: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>
Subject: Re: [BUG] lockup with the latest kernel
[ Sorry for the repost, but I sent to Ingo's "special" account last time.
/me needs to add an alias for mingo as well as ingo ]
On Wed, 19 Aug 2009, Steven Rostedt wrote:
>
> I'm hitting this funny lockup and I'm not sure why. This box is an older
> i386 SMP box that has two sockets (no dual core).
>
> $ cat /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 6
> model : 10
> model name : AMD Athlon(tm) MP 2800+
> stepping : 0
> cpu MHz : 2133.418
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
> bogomips : 4266.83
> clflush size : 32
> power management: ts
>
> processor : 1
> vendor_id : AuthenticAMD
> [ same as above ]
>
>
> $ lspci -vvv
> [snip]
> 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI (rev 03)
> Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI
> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Kernel driver in use: amd756_smbus
> [snip]
>
> Here's a few backtraces:
>
> [13288.222084] BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0110821,
> registers:
> [13288.222084] Modules linked in: iptable_filter ip_tables x_tables
> binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl auth_rpcgss
> sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log dm_mod
> snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp snd_emu10k1_synth
> snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1
> snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc
> snd_util_mem snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi
> snd_seq_midi_event snd_seq snd_timer 3c59x emu10k1_gp snd_seq_device mii
> gameport ohci_hcd snd ehci_hcd soundcore usbcore shpchp pci_hotplug
> i2c_amd756 amd_k7_agp i2c_core amd_rng agpgart rng_core container thermal
> processor thermal_sys button ide_cd_mod evdev cdrom psmouse serio_raw
> pcspkr parport_pc floppy parport
> [13288.222084]
> [13288.222084] Pid: 3, comm: migration/0 Not tainted (2.6.30-rc6 #3)
> Unknown
> [13288.222084] EIP: 0060:[<c0110821>] EFLAGS: 00000002 CPU: 0
> [13288.222084] EIP is at default_send_IPI_mask_logical+0x53/0x92
> [13288.222084] EAX: fffff000 EBX: 000000fd ECX: c042cccc EDX: 00000800
> [13288.222084] ESI: 00000002 EDI: 00000086 EBP: f7065f10 ESP: f7065f00
> [13288.222084] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [13288.222084] Process migration/0 (pid: 3, ti=f7064000 task=f705f0c0
> task.ti=f7064000)
> [13288.222084] Stack:
> [13288.222084] c0350550 0000007e efbfea20 c1819c60 f7065f18 c010f1fa
> f7065f24 c011b69c
> [13288.222084] c0350490 f7065f2c c011b70b f7065f5c c01213d3 00000000
> 00000000 00000001
> [13288.222084] 00000000 00000001 00000000 00000092 c5685f08 c5685f44
> 00000003 f7065f64
> [13288.222084] Call Trace:
> [13288.222084] [<c010f1fa>] ? native_smp_send_reschedule+0x44/0x46
> [13288.222084] [<c011b69c>] ? resched_task+0x5f/0x62
> [13288.222084] [<c011b70b>] ? check_preempt_curr_idle+0x13/0x15
> [13288.222084] [<c01213d3>] ? try_to_wake_up+0x1f4/0x228
> [13288.222084] [<c0121417>] ? default_wake_function+0x10/0x12
> [13288.222084] [<c011a22f>] ? __wake_up_common+0x39/0x61
> [13288.222084] [<c011b7da>] ? complete+0x30/0x43
> [13288.222084] [<c0123eaa>] ? migration_thread+0x19d/0x1db
> [13288.222084] [<c0123d0d>] ? migration_thread+0x0/0x1db
> [13288.222084] [<c0137411>] ? kthread+0x4b/0x6f
> [13288.222084] [<c01373c6>] ? kthread+0x0/0x6f
> [13288.222084] [<c01033cf>] ? kernel_thread_helper+0x7/0x10
> [13288.222084] Code: eb 27 3b c0 e8 56 62 01 00 a1 88 cd 42 c0 83 fb 02 8b
> 50 20 75 10 89 55 f0 ff 90 b8 00 00 00 8b 55 f0 eb 12 f3 90 a1 f4 f8 42 c0
> <8b> 80 00 c3 ff ff f6 c4 10 75 ee a1 f4 f8 42 c0 c1 e6 18 2d f0
> [13288.222084] ---[ end trace 94cd80322ca2a918 ]---
> [13288.222093] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c03492d5,
> registers:
> [13288.222093] Modules linked in: iptable_filter ip_tables x_tables
> binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl auth_rpcgss
> sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log dm_mod
> snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp snd_emu10k1_synth
> snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1
> snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc
> snd_util_mem snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi
> snd_seq_midi_event snd_seq snd_timer 3c59x emu10k1_gp snd_seq_device mii
> gameport ohci_hcd snd ehci_hcd soundcore usbcore shpchp pci_hotplug
> i2c_amd756 amd_k7_agp i2c_core amd_rng agpgart rng_core container thermal
> processor thermal_sys button ide_cd_mod evdev cdrom psmouse serio_raw
> pcspkr parport_pc floppy parport
> [13288.222093]
> [13288.222093] Pid: 0, comm: swapper Tainted: G D (2.6.30-rc6 #3)
> Unknown
> [13288.222093] EIP: 0060:[<c03492d5>] EFLAGS: 00000097 CPU: 1
> [13288.222093] EIP is at _spin_lock_irq+0x19/0x1f
> [13288.222093] EAX: c1819c60 EBX: c1819c60 ECX: f708f6b0 EDX: 00007e7d
> [13288.222093] ESI: c04bfc60 EDI: c04bcd18 EBP: f7093f18 ESP: f7093f18
> [13288.222093] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [13288.222093] Process swapper (pid: 0, ti=f7092000 task=f708f500
> task.ti=f7092000)
> [13288.222093] Stack:
> [13288.222093] f7093f8c c0347415 f7093f28 c012b204 f708f6b0 c01100ba
> c04bcd18 c04bfc60
> [13288.222093] f708f6b4 c1816488 00000000 00000001 00000000 f7093f94
> f7092000 0000007b
> [13288.222093] f708f500 000000d8 00000000 c0468e60 00000001 c1816f8c
> f7093f94 c01424e4
> [13288.222093] Call Trace:
> [13288.222093] [<c0347415>] ? __schedule+0xc8/0x855
> [13288.222093] [<c012b204>] ? irq_exit+0x39/0x5c
> [13288.222093] [<c01100ba>] ? smp_error_interrupt+0x61/0x63
> [13288.222093] [<c01424e4>] ? tick_nohz_restart_sched_tick+0x2c/0x139
> [13288.222093] [<c0347baf>] ? schedule+0xd/0x1c
> [13288.222093] [<c0101c5d>] ? cpu_idle+0x60/0x63
> [13288.222093] [<c0344d11>] ? start_secondary+0x195/0x19a
> [13288.222093] Code: 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 5d c3
> 55 89 e5 0f 1f 44 00 00 fa ba 00 01 00 00 f0 66 0f c1 10 38 f2 74 06 f3 90
> <8a> 10 eb f6 5d c3 55 89 e5 0f 1f 44 00 00 9c 5a fa f0 83 28 01
> [13288.222093] ---[ end trace 94cd80322ca2a919 ]---
>
>
> And another instance:
>
> :
> [45012.674548] Modules linked in: vfat fat iptable_filter ip_tables
> x_tables binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl
> auth_rpcgss sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log
> dm_mod snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp
> snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul
> snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
> snd_page_alloc snd_util_mem snd_hwdep ohci_hcd snd_seq_dummy ehci_hcd
> snd_seq_oss snd_seq_midi snd_rawmidi usbcore snd_seq_midi_event snd_seq
> 3c59x snd_timer snd_seq_device snd nls_base soundcore emu10k1_gp gameport
> amd_k7_agp i2c_amd756 mii i2c_core agpgart ide_cd_mod thermal psmouse
> amd_rng shpchp button pci_hotplug container pcspkr cdrom rng_core evdev
> serio_raw processor thermal_sys parport_pc parport floppy
> [45012.674548]
> [45012.674548] Pid: 0, comm: swapper Not tainted (2.6.31-rc6 #9) Unknown
> [45012.674548] EIP: 0060:[<c1267059>] EFLAGS: 00000097 CPU: 1
> [45012.674548] EIP is at _spin_lock_irq+0x15/0x1f
> [45012.674548] EAX: c1c9a200 EBX: c1c9a200 ECX: f708f550 EDX: 0000a7a6
> [45012.674548] ESI: 00000001 EDI: f708f708 EBP: f7093f1c ESP: f7093f1c
> [45012.674548] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [45012.674548] Process swapper (pid: 0, ti=f7092000 task=f708f550
> task.ti=f7092000)
> [45012.674548] Stack:
> [45012.674548] f7093f94 c1265404 c1032f33 f7093f30 c13f0200 f708f704
> c10032e6 c13ec1f4
> [45012.674548] <0> f708f708 c13f0200 00000001 00000000 f7093f94 f7092000
> 00000001 0000007b
> [45012.674548] <0> 000000d8 f708f550 c1395288 00000001 c1c9648c f7093f94
> c104babe f7092000
> [45012.674548] Call Trace:
> [45012.674548] [<c1265404>] ? schedule+0xca/0x86c
> [45012.674548] [<c1032f33>] ? irq_exit+0x39/0x5c
> [45012.674548] [<c10032e6>] ? error_interrupt+0x2a/0x30
> [45012.674548] [<c104babe>] ? tick_nohz_restart_sched_tick+0x2c/0x139
> [45012.674548] [<c1001c15>] ? cpu_idle+0x60/0x63
> [45012.674548] [<c1262bc7>] ? start_secondary+0x195/0x19a
> [45012.674548] Code: 01 00 00 f0 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6
> 89 c8 c9 c3 55 89 e5 0f 1f 44 00 00 fa ba 00 01 00 00 f0 66 0f c1 10 38 f2
> <74> 06 f3 90 8a 10 eb f6 c9 c3 55 89 e5 0f 1f 44 00 00 9c 5a fa
> [45012.674548] ---[ end trace 7323d2c20b35b0b0 ]---
> [45012.678548] BUG: NMI Watchdog detected LOCKUP on CPU0, ip c1013ff8,
> registers:
> [45012.678548] Modules linked in: vfat fat iptable_filter ip_tables
> x_tables binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl
> auth_rpcgss sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log
> dm_mod snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp
> snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul
> snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
> snd_page_alloc snd_util_mem snd_hwdep ohci_hcd snd_seq_dummy ehci_hcd
> snd_seq_oss snd_seq_midi snd_rawmidi usbcore snd_seq_midi_event snd_seq
> 3c59x snd_timer snd_seq_device snd nls_base soundcore emu10k1_gp gameport
> amd_k7_agp i2c_amd756 mii i2c_core agpgart ide_cd_mod thermal psmouse
> amd_rng shpchp button pci_hotplug container pcspkr cdrom rng_core evdev
> serio_raw processor thermal_sys parport_pc parport floppy
> [45012.678548]
> [45012.678548] Pid: 9294, comm: backup2l Tainted: G D (2.6.31-rc6
> #9) Unknown
> [45012.678548] EIP: 0060:[<c1013ff8>] EFLAGS: 00000002 CPU: 0
> [45012.678548] EIP is at default_send_IPI_mask_logical+0x5c/0x90
> [45012.678548] EAX: 000018f0 EBX: 00000082 ECX: c1356b5c EDX: 00000800
> [45012.678548] ESI: 000000fd EDI: 00000002 EBP: ceaf7e18 ESP: ceaf7e08
> [45012.678548] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [45012.678548] Process backup2l (pid: 9294, ti=ceaf6000 task=db276eb0
> task.ti=ceaf6000)
> [45012.678548] Stack:
> [45012.678548] c126fcc8 000000a7 db13e9c0 c1c9a200 ceaf7e20 c10127ce
> ceaf7e2c c1020ebf
> [45012.678548] <0> c126fc08 ceaf7e34 c1020f2e ceaf7e64 c1029d20 00000000
> 00000000 00000001
> [45012.678548] <0> 00000001 00000001 00000000 00000086 ce995e40 e1cee7f8
> 00000001 ceaf7e6c
> [45012.678548] Call Trace:
> [45012.678548] [<c10127ce>] ? native_smp_send_reschedule+0x44/0x46
> [45012.678548] [<c1020ebf>] ? resched_task+0x5f/0x62
> [45012.678548] [<c1020f2e>] ? check_preempt_curr_idle+0x13/0x15
> [45012.678548] [<c1029d20>] ? try_to_wake_up+0x1f7/0x22b
> [45012.678548] [<c1029d64>] ? default_wake_function+0x10/0x12
> [45012.678548] [<c1040645>] ? autoremove_wake_function+0x14/0x34
> [45012.678548] [<c101e110>] ? __wake_up_common+0x39/0x61
> [45012.678548] [<c1021048>] ? __wake_up_sync_key+0x38/0x4a
> [45012.678548] [<c1021069>] ? __wake_up_sync+0xf/0x12
> [45012.678548] [<c10b0fd8>] ? pipe_release+0x5e/0x92
> [45012.678548] [<c10b103c>] ? pipe_write_release+0x14/0x16
> [45012.678548] [<c10ac42e>] ? __fput+0xcf/0x177
> [45012.678548] [<c10ac4f0>] ? fput+0x1a/0x1c
> [45012.678548] [<c10a982d>] ? filp_close+0x56/0x60
> [45012.678548] [<c102f56b>] ? put_files_struct+0x5d/0xa1
> [45012.678548] [<c102f5ea>] ? exit_files+0x3b/0x40
> [45012.678548] [<c1030ca5>] ? do_exit+0x1ae/0x5b3
> [45012.678548] [<c103110c>] ? do_group_exit+0x62/0x89
> [45012.678548] [<c103114b>] ? sys_exit_group+0x18/0x1c
> [45012.678548] [<c1002944>] ? sysenter_do_call+0x12/0x22
> [45012.678548] Code: a1 18 6c 35 c1 83 fe 02 8b 50 20 75 10 89 55 f0 ff 90
> b8 00 00 00 8b 55 f0 eb 12 f3 90 a1 d4 97 35 c1 8b 80 00 c3 ff ff f6 c4 10
> <75> ee a1 d4 97 35 c1 c1 e7 18 2d f0 3c 00 00 89 38 89 f0 09 d0
> [45012.678548] ---[ end trace 7323d2c20b35b0b1 ]---
> [45014.223303] Clocksource tsc unstable (delta = 9374024843 ns)
> [45014.229055] Fixing recursive fault but reboot is needed!
>
>
> Always happens where one CPU is sending an IPI and the other has the rq
> spinlock. Seems to be that the IPI expects the other CPU to not have
> interrupts disabled or something?
>
> Note, I've seen this on 2.6.30-rc6 as well (yes that's 2.6.30). But this
> does not happen on 2.6.29. Unfortunately, 2.6.29 makes my NIC go kaputt
> for some reason.
>
> I've enabled LOCKDEP and it just makes the bug trigger easier.
>
> Anyway, anyone have any ideas?
>
> Since this can take up to a week to trigger, I'll start doing a git bisect
> on it from 2.6.29 to 2.6.30-rc6. And see if I can find the change that
> broke. Although this box is my DHCP/web/mail server, so it will be a PITA
> :-(
>
> Thanks,
>
> -- Steve
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists