lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0908191149470.12663@gandalf.stny.rr.com>
Date:	Wed, 19 Aug 2009 11:50:37 -0400 (EDT)
From:	Steven Rostedt <rostedt@...dmis.org>
To:	LKML <linux-kernel@...r.kernel.org>
cc:	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [BUG] lockup with the latest kernel


[ Sorry for the repost, but I sent to Ingo's "special" account last time.
  /me needs to add an alias for mingo as well as ingo ]


On Wed, 19 Aug 2009, Steven Rostedt wrote:

> 
> I'm hitting this funny lockup and I'm not sure why. This box is an older 
> i386 SMP box that has two sockets (no dual core).
> 
> $ cat  /proc/cpuinfo 
> processor	: 0
> vendor_id	: AuthenticAMD
> cpu family	: 6
> model		: 10
> model name	: AMD Athlon(tm) MP 2800+
> stepping	: 0
> cpu MHz		: 2133.418
> cache size	: 512 KB
> fdiv_bug	: no
> hlt_bug		: no
> f00f_bug	: no
> coma_bug	: no
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 1
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
> bogomips	: 4266.83
> clflush size	: 32
> power management: ts
> 
> processor	: 1
> vendor_id	: AuthenticAMD
> [ same as above ]
> 
> 
> $ lspci -vvv
> [snip]
> 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI (rev 03)
>         Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI
>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Kernel driver in use: amd756_smbus
> [snip]
> 
> Here's a few backtraces:
> 
> [13288.222084] BUG: NMI Watchdog detected LOCKUP on CPU0, ip c0110821, 
> registers:
> [13288.222084] Modules linked in: iptable_filter ip_tables x_tables 
> binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl auth_rpcgss 
> sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log dm_mod 
> snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp snd_emu10k1_synth 
> snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 
> snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc 
> snd_util_mem snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi 
> snd_seq_midi_event snd_seq snd_timer 3c59x emu10k1_gp snd_seq_device mii 
> gameport ohci_hcd snd ehci_hcd soundcore usbcore shpchp pci_hotplug 
> i2c_amd756 amd_k7_agp i2c_core amd_rng agpgart rng_core container thermal 
> processor thermal_sys button ide_cd_mod evdev cdrom psmouse serio_raw 
> pcspkr parport_pc floppy parport
> [13288.222084] 
> [13288.222084] Pid: 3, comm: migration/0 Not tainted (2.6.30-rc6 #3) 
> Unknown
> [13288.222084] EIP: 0060:[<c0110821>] EFLAGS: 00000002 CPU: 0
> [13288.222084] EIP is at default_send_IPI_mask_logical+0x53/0x92
> [13288.222084] EAX: fffff000 EBX: 000000fd ECX: c042cccc EDX: 00000800
> [13288.222084] ESI: 00000002 EDI: 00000086 EBP: f7065f10 ESP: f7065f00
> [13288.222084]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [13288.222084] Process migration/0 (pid: 3, ti=f7064000 task=f705f0c0 
> task.ti=f7064000)
> [13288.222084] Stack:
> [13288.222084]  c0350550 0000007e efbfea20 c1819c60 f7065f18 c010f1fa 
> f7065f24 c011b69c
> [13288.222084]  c0350490 f7065f2c c011b70b f7065f5c c01213d3 00000000 
> 00000000 00000001
> [13288.222084]  00000000 00000001 00000000 00000092 c5685f08 c5685f44 
> 00000003 f7065f64
> [13288.222084] Call Trace:
> [13288.222084]  [<c010f1fa>] ? native_smp_send_reschedule+0x44/0x46
> [13288.222084]  [<c011b69c>] ? resched_task+0x5f/0x62
> [13288.222084]  [<c011b70b>] ? check_preempt_curr_idle+0x13/0x15
> [13288.222084]  [<c01213d3>] ? try_to_wake_up+0x1f4/0x228
> [13288.222084]  [<c0121417>] ? default_wake_function+0x10/0x12
> [13288.222084]  [<c011a22f>] ? __wake_up_common+0x39/0x61
> [13288.222084]  [<c011b7da>] ? complete+0x30/0x43
> [13288.222084]  [<c0123eaa>] ? migration_thread+0x19d/0x1db
> [13288.222084]  [<c0123d0d>] ? migration_thread+0x0/0x1db
> [13288.222084]  [<c0137411>] ? kthread+0x4b/0x6f
> [13288.222084]  [<c01373c6>] ? kthread+0x0/0x6f
> [13288.222084]  [<c01033cf>] ? kernel_thread_helper+0x7/0x10
> [13288.222084] Code: eb 27 3b c0 e8 56 62 01 00 a1 88 cd 42 c0 83 fb 02 8b 
> 50 20 75 10 89 55 f0 ff 90 b8 00 00 00 8b 55 f0 eb 12 f3 90 a1 f4 f8 42 c0 
> <8b> 80 00 c3 ff ff f6 c4 10 75 ee a1 f4 f8 42 c0 c1 e6 18 2d f0 
> [13288.222084] ---[ end trace 94cd80322ca2a918 ]---
> [13288.222093] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c03492d5, 
> registers:
> [13288.222093] Modules linked in: iptable_filter ip_tables x_tables 
> binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl auth_rpcgss 
> sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log dm_mod 
> snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp snd_emu10k1_synth 
> snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 
> snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc 
> snd_util_mem snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi 
> snd_seq_midi_event snd_seq snd_timer 3c59x emu10k1_gp snd_seq_device mii 
> gameport ohci_hcd snd ehci_hcd soundcore usbcore shpchp pci_hotplug 
> i2c_amd756 amd_k7_agp i2c_core amd_rng agpgart rng_core container thermal 
> processor thermal_sys button ide_cd_mod evdev cdrom psmouse serio_raw 
> pcspkr parport_pc floppy parport
> [13288.222093] 
> [13288.222093] Pid: 0, comm: swapper Tainted: G      D    (2.6.30-rc6 #3) 
> Unknown
> [13288.222093] EIP: 0060:[<c03492d5>] EFLAGS: 00000097 CPU: 1
> [13288.222093] EIP is at _spin_lock_irq+0x19/0x1f
> [13288.222093] EAX: c1819c60 EBX: c1819c60 ECX: f708f6b0 EDX: 00007e7d
> [13288.222093] ESI: c04bfc60 EDI: c04bcd18 EBP: f7093f18 ESP: f7093f18
> [13288.222093]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [13288.222093] Process swapper (pid: 0, ti=f7092000 task=f708f500 
> task.ti=f7092000)
> [13288.222093] Stack:
> [13288.222093]  f7093f8c c0347415 f7093f28 c012b204 f708f6b0 c01100ba 
> c04bcd18 c04bfc60
> [13288.222093]  f708f6b4 c1816488 00000000 00000001 00000000 f7093f94 
> f7092000 0000007b
> [13288.222093]  f708f500 000000d8 00000000 c0468e60 00000001 c1816f8c 
> f7093f94 c01424e4
> [13288.222093] Call Trace:
> [13288.222093]  [<c0347415>] ? __schedule+0xc8/0x855
> [13288.222093]  [<c012b204>] ? irq_exit+0x39/0x5c
> [13288.222093]  [<c01100ba>] ? smp_error_interrupt+0x61/0x63
> [13288.222093]  [<c01424e4>] ? tick_nohz_restart_sched_tick+0x2c/0x139
> [13288.222093]  [<c0347baf>] ? schedule+0xd/0x1c
> [13288.222093]  [<c0101c5d>] ? cpu_idle+0x60/0x63
> [13288.222093]  [<c0344d11>] ? start_secondary+0x195/0x19a
> [13288.222093] Code: 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 5d c3 
> 55 89 e5 0f 1f 44 00 00 fa ba 00 01 00 00 f0 66 0f c1 10 38 f2 74 06 f3 90 
> <8a> 10 eb f6 5d c3 55 89 e5 0f 1f 44 00 00 9c 5a fa f0 83 28 01 
> [13288.222093] ---[ end trace 94cd80322ca2a919 ]---
> 
> 
> And another instance:
> 
> :
> [45012.674548] Modules linked in: vfat fat iptable_filter ip_tables 
> x_tables binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl 
> auth_rpcgss sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log 
> dm_mod snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp 
> snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul 
> snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm 
> snd_page_alloc snd_util_mem snd_hwdep ohci_hcd snd_seq_dummy ehci_hcd 
> snd_seq_oss snd_seq_midi snd_rawmidi usbcore snd_seq_midi_event snd_seq 
> 3c59x snd_timer snd_seq_device snd nls_base soundcore emu10k1_gp gameport 
> amd_k7_agp i2c_amd756 mii i2c_core agpgart ide_cd_mod thermal psmouse 
> amd_rng shpchp button pci_hotplug container pcspkr cdrom rng_core evdev 
> serio_raw processor thermal_sys parport_pc parport floppy
> [45012.674548] 
> [45012.674548] Pid: 0, comm: swapper Not tainted (2.6.31-rc6 #9) Unknown
> [45012.674548] EIP: 0060:[<c1267059>] EFLAGS: 00000097 CPU: 1
> [45012.674548] EIP is at _spin_lock_irq+0x15/0x1f
> [45012.674548] EAX: c1c9a200 EBX: c1c9a200 ECX: f708f550 EDX: 0000a7a6
> [45012.674548] ESI: 00000001 EDI: f708f708 EBP: f7093f1c ESP: f7093f1c
> [45012.674548]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [45012.674548] Process swapper (pid: 0, ti=f7092000 task=f708f550 
> task.ti=f7092000)
> [45012.674548] Stack:
> [45012.674548]  f7093f94 c1265404 c1032f33 f7093f30 c13f0200 f708f704 
> c10032e6 c13ec1f4
> [45012.674548] <0> f708f708 c13f0200 00000001 00000000 f7093f94 f7092000 
> 00000001 0000007b
> [45012.674548] <0> 000000d8 f708f550 c1395288 00000001 c1c9648c f7093f94 
> c104babe f7092000
> [45012.674548] Call Trace:
> [45012.674548]  [<c1265404>] ? schedule+0xca/0x86c
> [45012.674548]  [<c1032f33>] ? irq_exit+0x39/0x5c
> [45012.674548]  [<c10032e6>] ? error_interrupt+0x2a/0x30
> [45012.674548]  [<c104babe>] ? tick_nohz_restart_sched_tick+0x2c/0x139
> [45012.674548]  [<c1001c15>] ? cpu_idle+0x60/0x63
> [45012.674548]  [<c1262bc7>] ? start_secondary+0x195/0x19a
> [45012.674548] Code: 01 00 00 f0 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 
> 89 c8 c9 c3 55 89 e5 0f 1f 44 00 00 fa ba 00 01 00 00 f0 66 0f c1 10 38 f2 
> <74> 06 f3 90 8a 10 eb f6 c9 c3 55 89 e5 0f 1f 44 00 00 9c 5a fa 
> [45012.674548] ---[ end trace 7323d2c20b35b0b0 ]---
> [45012.678548] BUG: NMI Watchdog detected LOCKUP on CPU0, ip c1013ff8, 
> registers:
> [45012.678548] Modules linked in: vfat fat iptable_filter ip_tables 
> x_tables binfmt_misc ppdev lp autofs4 nfsd exportfs nfs lockd nfs_acl 
> auth_rpcgss sunrpc ipv6 sd_mod dm_snapshot dm_mirror dm_region_hash dm_log 
> dm_mod snd_usb_audio snd_usb_lib usbhid usb_storage hid usblp 
> snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul 
> snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm 
> snd_page_alloc snd_util_mem snd_hwdep ohci_hcd snd_seq_dummy ehci_hcd 
> snd_seq_oss snd_seq_midi snd_rawmidi usbcore snd_seq_midi_event snd_seq 
> 3c59x snd_timer snd_seq_device snd nls_base soundcore emu10k1_gp gameport 
> amd_k7_agp i2c_amd756 mii i2c_core agpgart ide_cd_mod thermal psmouse 
> amd_rng shpchp button pci_hotplug container pcspkr cdrom rng_core evdev 
> serio_raw processor thermal_sys parport_pc parport floppy
> [45012.678548] 
> [45012.678548] Pid: 9294, comm: backup2l Tainted: G      D    (2.6.31-rc6 
> #9) Unknown
> [45012.678548] EIP: 0060:[<c1013ff8>] EFLAGS: 00000002 CPU: 0
> [45012.678548] EIP is at default_send_IPI_mask_logical+0x5c/0x90
> [45012.678548] EAX: 000018f0 EBX: 00000082 ECX: c1356b5c EDX: 00000800
> [45012.678548] ESI: 000000fd EDI: 00000002 EBP: ceaf7e18 ESP: ceaf7e08
> [45012.678548]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [45012.678548] Process backup2l (pid: 9294, ti=ceaf6000 task=db276eb0 
> task.ti=ceaf6000)
> [45012.678548] Stack:
> [45012.678548]  c126fcc8 000000a7 db13e9c0 c1c9a200 ceaf7e20 c10127ce 
> ceaf7e2c c1020ebf
> [45012.678548] <0> c126fc08 ceaf7e34 c1020f2e ceaf7e64 c1029d20 00000000 
> 00000000 00000001
> [45012.678548] <0> 00000001 00000001 00000000 00000086 ce995e40 e1cee7f8 
> 00000001 ceaf7e6c
> [45012.678548] Call Trace:
> [45012.678548]  [<c10127ce>] ? native_smp_send_reschedule+0x44/0x46
> [45012.678548]  [<c1020ebf>] ? resched_task+0x5f/0x62
> [45012.678548]  [<c1020f2e>] ? check_preempt_curr_idle+0x13/0x15
> [45012.678548]  [<c1029d20>] ? try_to_wake_up+0x1f7/0x22b
> [45012.678548]  [<c1029d64>] ? default_wake_function+0x10/0x12
> [45012.678548]  [<c1040645>] ? autoremove_wake_function+0x14/0x34
> [45012.678548]  [<c101e110>] ? __wake_up_common+0x39/0x61
> [45012.678548]  [<c1021048>] ? __wake_up_sync_key+0x38/0x4a
> [45012.678548]  [<c1021069>] ? __wake_up_sync+0xf/0x12
> [45012.678548]  [<c10b0fd8>] ? pipe_release+0x5e/0x92
> [45012.678548]  [<c10b103c>] ? pipe_write_release+0x14/0x16
> [45012.678548]  [<c10ac42e>] ? __fput+0xcf/0x177
> [45012.678548]  [<c10ac4f0>] ? fput+0x1a/0x1c
> [45012.678548]  [<c10a982d>] ? filp_close+0x56/0x60
> [45012.678548]  [<c102f56b>] ? put_files_struct+0x5d/0xa1
> [45012.678548]  [<c102f5ea>] ? exit_files+0x3b/0x40
> [45012.678548]  [<c1030ca5>] ? do_exit+0x1ae/0x5b3
> [45012.678548]  [<c103110c>] ? do_group_exit+0x62/0x89
> [45012.678548]  [<c103114b>] ? sys_exit_group+0x18/0x1c
> [45012.678548]  [<c1002944>] ? sysenter_do_call+0x12/0x22
> [45012.678548] Code: a1 18 6c 35 c1 83 fe 02 8b 50 20 75 10 89 55 f0 ff 90 
> b8 00 00 00 8b 55 f0 eb 12 f3 90 a1 d4 97 35 c1 8b 80 00 c3 ff ff f6 c4 10 
> <75> ee a1 d4 97 35 c1 c1 e7 18 2d f0 3c 00 00 89 38 89 f0 09 d0 
> [45012.678548] ---[ end trace 7323d2c20b35b0b1 ]---
> [45014.223303] Clocksource tsc unstable (delta = 9374024843 ns)
> [45014.229055] Fixing recursive fault but reboot is needed!
> 
> 
> Always happens where one CPU is sending an IPI and the other has the rq 
> spinlock. Seems to be that the IPI expects the other CPU to not have 
> interrupts disabled or something?
> 
> Note, I've seen this on 2.6.30-rc6 as well (yes that's 2.6.30). But this 
> does not happen on 2.6.29. Unfortunately, 2.6.29 makes my NIC go kaputt 
> for some reason.
> 
> I've enabled LOCKDEP and it just makes the bug trigger easier.
> 
> Anyway, anyone have any ideas?
> 
> Since this can take up to a week to trigger, I'll start doing a git bisect 
> on it from 2.6.29 to 2.6.30-rc6. And see if I can find the change that 
> broke. Although this box is my DHCP/web/mail server, so it will be a PITA 
> :-(
> 
> Thanks,
> 
> -- Steve
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ