[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081203101100.GO17966@skl-net.de>
Date: Wed, 3 Dec 2008 11:11:00 +0100
From: Andre Noll <maan@...temlinux.org>
To: linux-ext4@...r.kernel.org
Subject: Problems with checking corrupted large ext3 file system
Hi,
I've some trouble checking a corrupted 9T large ext3 fs which resides
on a logical volume. The underlying physical volumes are three hardware
raid systems, one of which started to crash frequently. I was able
to pvmove away the data from the buggy system, so everything is fine
now on the hardware side.
However, the crashes left me with a seriously corrupted file system
from which I'm trying to recover as much as possible. First step was
to unmount the file system after users reported I/O errors when trying
to open files. The system log contained many messages like
[102445.420125] EXT3-fs error (device dm-2): ext3_free_blocks_sb: bit already cleared for block 544108393
and some of the form
[160301.277477] EXT3-fs error (device dm-2): htree_dirblock_to_tree: bad entry in directory #153542738: rec_len % 4 != 0 - offset=0, inode=1381653864, +rec_len=26709, name_len=79
So I compiled the master branch of the e2fsprogs git repo as of
Dec 1 (tip: 8680b4) and executed
./e2fsck -y -C0 /dev/mapper/abel-abt6_projects
This ran for a while and then started to output a couple of these:
Inode table for group 68217 is not in group. (block 825373744)
WARNING: SEVERE DATA LOSS POSSIBLE.
along with many lines of the form
Illegal block #3036172 (4233778405) in inode 115335438.
CLEARED.
But then it continued just fine without printing further
messsages. After about 4 hours it completed but decided to re-run from
the beginning and this is where the real trouble seems to start. The
next day I found thousands of lines like this on the console:
/backup/data/solexa_analysis/ATH/MA/MA-30-29/run_30/4/length_42/reads_0.fl (inode #145326082, mod time Tue Jan 22 05:09:36 2008)
followed by
Clone multiply-claimed blocks? yes
At this point the fsck seems to hang. No further messages, no progress
bar for at least 17 hours. The lights on the raid system aren't
flashing but there seems to be a bit of I/O going on as stracing the
e2fsck process yields
lseek(3, 6206310776832, SEEK_SET) = 6206310776832
read(3, "002107740635\tD\t2\t169\t35\t0\thhhhhh"..., 4096) = 4096
lseek(3, 1263113973760, SEEK_SET) = 1263113973760
write(3, "B9K@...C=L-F77F4:CGGK\n3\t14221118"..., 4096) = 4096
lseek(3, 5861641846784, SEEK_SET) = 5861641846784
read(3, "hhhhhh\tIIIIIIIIIIIIIIIIIIIIIIIII"..., 4096) = 4096
lseek(3, 1263113977856, SEEK_SET) = 1263113977856
write(3, "\t1.00\t0.46\t19\t4\t2\t0\t1\tA\t33\t31\t0\t"..., 4096) = 4096
There's about only one read per second, so the fsck might take rather
long if it continues to run at this speed ;)
It's running for 34 hours now and I don't know what to do, so here are
a couple of questions for you ext3 gurus:
Is there any hope this will ever complete?
Should I abort the fsck and restart?
Do things get even worse if I abort it and mount the file
system r/o so that I can see whether important files are
still there?
Are there any magic e2fsck command line options I should try?
The box is a 2xQuad Core Intel machine with 32G Ram and is running
a vanilla 2.6.25.20 kernel. Any help is greatly appreciated.
Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)
Powered by blists - more mailing lists