[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20061117152722.GP28000@renesys.com>
Date: Fri, 17 Nov 2006 10:27:22 -0500
From: John Rouillard <rouilj@...esys.com>
To: linux-kernel@...r.kernel.org
Subject: kernel oops: assertion failure at journal:576 (ext3 issue?)
Hello all:
We have a few (3) systems that are crashing with:
Assertion failure in journal_next_log_block() at fs/jbd/journal.c:576:
"journal->j_free > 1"
Kernel BUG at journal:576
invalid operand: 0000 [1] SMP
CPU 1
Modules linked in:
md5 ipv6 parport_pc lp parport w83627hf eeprom adm1026 hwmon_vid hwmon
i2c_sensor i2c_isa i2c_amd756 i2c_amd8111 i2c_dev i2c_core nfs lockd
nfs_acl sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter
ip_tables button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot
dm_zero dm_mirror ext3 jbd dm_mod 3w_9xxx sata_mv libata sd_mod
scsi_mod
Pid: 1603, comm: kjournald Not tainted 2.6.9-42.0.3.ELsmp
RIP: 0010:[<ffffffffa006c18a>]
<ffffffffa006c18a>{:jbd:journal_next_log_block+76}
RSP: 0018:0000010476327b88 EFLAGS: 00010212
RAX: 0000000000000060 RBX: 0000010283163e00 RCX: ffffffff803e1fe8
RDX: ffffffff803e1fe8 RSI: 0000000000000246 RDI: ffffffff803e1fe0
RBP: 0000000000000040 R08: ffffffff803e1fe8 R09: 0000010283163e00
R10: 0000000100000000 R11: ffffffff8011e884 R12: 0000010283163e24
R13: 0000010476327be0 R14: 0000010283163e00 R15: 000000000000002e
FS: 0000002a95560b00(0000) GS:ffffffff804e5200(0000)
knlGS:00000000f7ff36c0
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9556c000 CR3: 0000000037e42000 CR4: 00000000000006e0
Process kjournald (pid: 1603, threadinfo 0000010476326000, task
0000010478d777f0)
Stack: 0000010453f4afa8 0000010310072240 0000000000000040
0000010147528be0
000001044240a880 ffffffffa0067dfe 00000e7c00000000
00000101c33f2184
0000000000000000 0000010310b12f50
Call Trace:<ffffffffa0067dfe>{:jbd:journal_commit_transaction+1834}
<ffffffff80135756>{autoremove_wake_function+0}
<ffffffff80135756>{autoremove_wake_function+0}
<ffffffffa006a914>{:jbd:kjournald+250}
<ffffffff80135756>{autoremove_wake_function+0}
<ffffffff80135756>{autoremove_wake_function+0}
<ffffffffa006a814>{:jbd:commit_timeout+0}
<ffffffff80110f47>{child_rip+8}
<ffffffffa006a81a>{:jbd:kjournald+0}
<ffffffff80110f3f>{child_rip+0}
Code: 0f 0b bd e2 06 a0 ff ff ff ff 40 02 48 8b ab 18 01 00 00 48
RIP <ffffffffa006c18a>{:jbd:journal_next_log_block+76} RSP
<0000010476327b88>
<0>Kernel panic - not syncing: Oops
(Note I editied together some lines in the "Modules linked in"
section. The rest is cut from the serial console (size 80x24) on the
system.)
We are running centos 4.4 kernel. Uname -a shows:
Linux cook05 2.6.9-42.0.3.ELsmp #1 SMP Fri Oct 6 06:28:26 CDT 2006
x86_64 x86_64 x86_64 GNU/Linux
The disk subsystem for this crash are 4 sata disks on a 3ware 9550
(see the attached dmesg output for more info) with a mix of western
digital and seagate drives. It has also crashed with sysrq enabled and
(not surprisingly) the system is totally dead. We have to power cycle
it to reboot it.
Other systems experiencing the same crash have:
* non-smp version of the same kernel with the software md raid
drivers
* same kernel running a megaraid raid card
The same crash has also been seen with an earlier kernel version
2.6.9-42.ELsmp.
It seems to crash when we expect the system to have high IO, but we
don't have any hard evidence of throughput/transactions to disk to
support that.
We can try setting up a remote kernel dump if that would be
useful/would work.
We get a crash every couple of days on average (sometimes two crashes
with 30 min-2 hours between them) so we can try applying patches/new
kernels if needed and see how the system does.
I have attached selected lines from dmesg to give some additional info
about the hardware and config of the system. I tried to attach
/proc/kallsyms from the system as requested by the mailing list FAQ
at: http://www.tux.org/lkml/#s4-3. However it has been two days since
I originally sent that email and I haven't see it arrive in the
archives, so that info is available on request. The dmesg info is
from a post crash boot that should be identical to the pre-crash boot.
If you require more/different information just let me know and I will
try to obtain it.
Thank you for your help.
--
-- rouilj
John Rouillard
System Administrator
Renesys Corporation
603-643-9300 x 111
View attachment "cook05.dmesg_selected.txt" of type "text/plain" (6190 bytes)
Powered by blists - more mailing lists