[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <026501c72237$0464f7a0$0400a8c0@dcccs>
Date: Mon, 18 Dec 2006 00:56:41 +0100
From: Haar János <djani22@...center.hu>
To: "David Chinner" <dgc@....com>
Cc: <linux-xfs@....sgi.com>, <linux-kernel@...r.kernel.org>
Subject: Re: xfslogd-spinlock bug?
----- Original Message -----
From: "David Chinner" <dgc@....com>
To: "Haar János" <djani22@...center.hu>
Cc: <linux-xfs@....sgi.com>; <linux-kernel@...r.kernel.org>
Sent: Sunday, December 17, 2006 11:44 PM
Subject: Re: xfslogd-spinlock bug?
> On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
> > Hi
> >
> > I have some news.
> >
> > I dont know there is a context between 2 messages, but i can see, the
> > spinlock bug comes always on cpu #3.
> >
> > Somebody have any idea?
>
> Your disk interrupts are directed to CPU 3, and so log I/O completion
> occurs on that CPU.
CPU0 CPU1 CPU2 CPU3
0: 100 0 0 4583704 IO-APIC-edge timer
1: 0 0 0 2 IO-APIC-edge i8042
4: 0 0 0 3878668 IO-APIC-edge serial
8: 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 0 3 IO-APIC-edge i8042
14: 3072118 0 0 181 IO-APIC-edge ide0
16: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb2
18: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb4
19: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb3
23: 0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
52: 0 0 0 213052723 IO-APIC-fasteoi eth1
53: 0 0 0 91913759 IO-APIC-fasteoi eth2
100: 0 0 0 16776910 IO-APIC-fasteoi eth0
NMI: 42271 43187 42234 43168
LOC: 4584247 4584219 4584215 4584198
ERR: 0
Maybe....
I have 3 XFS on this system, with 3 source.
1. 200G one ide hdd.
2. 2x200G mirror on 1 ide + 1 sata hdd.
3. 4x3.3TB strip on NBD.
The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the
CPU0.
>
> > Dec 16 12:08:36 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> > Dec 16 12:08:36 dy-base general protection fault: 0000 [1]
> > Dec 16 12:08:36 dy-base SMP
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base CPU 3
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Modules linked in:
> > Dec 16 12:08:36 dy-base nbd
>
> Are you using XFS on a NBD?
Yes, on the 3. source.
I used it about 1.5 years.
(The nbd deadlock is fixed on my system, thanks to Herbert Xu on 2.6.14.)
>
> > Dec 16 12:08:36 dy-base rd
> > Dec 16 12:08:36 dy-base netconsole
> > Dec 16 12:08:36 dy-base e1000
> > Dec 16 12:08:36 dy-base video
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> > Dec 16 12:08:36 dy-base RIP: 0010:[<ffffffff803f3aba>]
> > Dec 16 12:08:36 dy-base [<ffffffff803f3aba>] spin_bug+0x69/0xdf
> > Dec 16 12:08:36 dy-base RSP: 0018:ffff81011fdedbc0 EFLAGS: 00010002
> > Dec 16 12:08:36 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX:
> ^^^^^^^^^^^^^^^^
> Anyone recognise that pattern?
I think i have one idea.
This issue can stops sometimes the 5sec automatic restart on crash, and this
shows possible memory corruption, and if the bug occurs in the IRQ
handling.... :-)
I have a lot of logs about this issue, and the RAX, RBX always the same.
>
> > Dec 16 12:08:36 dy-base Call Trace:
> > Dec 16 12:08:36 dy-base [<ffffffff803f3bdc>] _raw_spin_lock+0x23/0xf1
> > Dec 16 12:08:36 dy-base [<ffffffff805e7f2b>]
_spin_lock_irqsave+0x11/0x18
> > Dec 16 12:08:36 dy-base [<ffffffff80222aab>] __wake_up+0x22/0x50
> > Dec 16 12:08:36 dy-base [<ffffffff803c97f9>] xfs_buf_unpin+0x21/0x23
> > Dec 16 12:08:36 dy-base [<ffffffff803970a4>]
xfs_buf_item_unpin+0x2e/0xa6
>
> This implies a spinlock inside a wait_queue_head_t is corrupt.
>
> What are you type of system do you have, and what sort of
> workload are you running?
OS: Fedora 5, 64bit.
HW: dual xeon, with HT, ram 4GB.
(the min_free_kbytes limit is set to 128000, because sometimes the e1000
driver run out the reserved memory during irq handling.)
Workload:
I use this system for free web storage.
(2x apache 2.0.xx, 12x pure-ftpd, 2x mysql but sql only use the source #2
fs.)
The normal system load is ~20-40, but currently i have a little problem with
apache, because it sometimes starts to read a lot from the big XFS device,
and eats all memory, the load is rising to 700-800.
At this point i use httpd restart, and everithing go back to normal, but if
i offline.....
Thanks a lot!
Janos
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists