linux-ext4 - EXT3 filesystem corruptions on AoE, RAID and LVM?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <45832321.4010305@tuxes.nl>
Date:	Fri, 15 Dec 2006 23:35:13 +0100
From:	Bas van Schaik <bas@...es.nl>
To:	Linux extfs development <linux-ext4@...r.kernel.org>
Subject: EXT3 filesystem corruptions on AoE, RAID and LVM?

Hi all,

I'm maintaining two clusters, with machines running a mix between Debian
Stable with Etch-kernels to have AoE (ATA over Ethernet support).
Machines in these clusters "export" their harddisks using AoE, and one
machine in the cluster imports those using the kernel "aoe"-module. On
top of those imported devices, multiple RAID5-arrays are created, and
LVM is running on top of RAID, ext3 on the LVM LV.

After a few days, I get EXT3-errors. like this:
>> EXT3-fs: mounted filesystem with ordered data mode.
>> EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already
cleared for block 412186
>> Aborting journal on device loop0.
>> EXT3-fs error (device loop0) in ext3_free_blocks_sb: Journal has aborted
>> EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has
aborted
>> EXT3-fs error (device loop0) in ext3_truncate: Journal has aborted
>> EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has
aborted
>> EXT3-fs error (device loop0) in ext3_orphan_del: Journal has aborted
>> EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has
aborted
>> EXT3-fs error (device loop0) in ext3_delete_inode: Journal has aborted
>> __journal_remove_journal_head: freeing b_committed_data
>> __journal_remove_journal_head: freeing b_committed_data

(...)

>> __journal_remove_journal_head: freeing b_committed_data
>> ext3_abort called.
>> EXT3-fs error (device loop0): ext3_journal_start_sb: Detected aborted
journal
>> Remounting filesystem read-only
>> __journal_remove_journal_head: freeing b_committed_data

FSCK'ing the filesystem fixes those errors, but after a few days (or
weeks, depending on the fs load) the corruptions appear again. I might
be worth telling you that there are no other suspicious messages in my logs.

I saw some other discussions on the mailinglist, but I don't think their
related to my problems. I don't know if I need to file a bug on this,
neither do I know which details you need to help me solve this problem.
So for now I just want to here your thoughts. FYI:

Kernel information for cluster 1:
>> root@...inity:~# uname -a
>> Linux infinity 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686
GNU/Linux

And cluster 2:
>> dust:~# uname -a
>> Linux dust 2.6.18-3-686 #1 SMP Thu Nov 23 20:49:23 UTC 2006 i686
GNU/Linux

Note that these are not vanilla kernels, but Debian kernels. However,
AFAIK there are no Debian-specific patches to AoE, ext3, LVM or RAID.

Thanks for your replies!

Best regards,

  -- Bas van Schaik
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html