[<prev] [next>] [day] [month] [year] [list]
Message-Id: <50C21406020000A10000DCE2@gwsmtp1.uni-regensburg.de>
Date: Fri, 07 Dec 2012 16:06:30 +0100
From: "Ulrich Windl" <Ulrich.Windl@...uni-regensburg.de>
To: <linux-kernel@...r.kernel.org>
Subject: ext3 corruption in 3.0 kernel (SLES11 SP2 x86_64 (AMD
Opteron))
Hi!
I thought I'd let you know of two ext3 corruptions found on an ADM Opteron server running SLES11 SP2 (kernel-xen-3.0.42-0.7.3). Corruptions occurred at different times in different files on different machines: Too much to be ignored.
The older one looked like this:
[75548.267404] EXT3-fs error (device dm-0): htree_dirblock_to_tree: bad entry in directory #205978: rec_len % 4 != 0 - offset=4096, inode=2531699, rec_len=41331, name_len=38
And a more recent one looks like this:
kernel: [261958.359401] EXT3-fs error (device dm-0): ext3_add_entry: bad entry in directory #85582: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
As the nodes are running Xen VMM in a cluster, it's possible that node see Resets at any time (fencing), but I thought a journaling filesystem would either not allow or fix corruption.
In both cases I found this problem when a file could not be created like this RPM error message:
Error: RPM failed: error: unpacking of archive failed on file /lib/modules/3.0.42-0.7-default/kernel/drivers/media/video/cpia2/cpia2.ko;50c1fafd: cpio: open failed - Input/output error
After a reset I had to repair the filesystem manually with these type of errors:
Inode 248552 was part of the orphaned inode list. FIXED.
Block bitmap differences:
Free blocks count wrong for group
After repair and reboot I still saw:
kernel: [ 698.061916] EXT3-fs error (device dm-0): ext3_lookup: deleted inode referenced: 68710
kernel: [ 698.061916] EXT3-fs error (device dm-0): ext3_lookup: deleted inode referenced: 68711
(dm-0 is the root Logical Volume)
CPU-Details (Sun X4100 Server) are:
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 285
stepping : 2
(I know this CPU has some bugs with virtualization; is filesystem corruption one of them?)
Regards,
Ulrich
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists