[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0904201237500.17316@xpc17.ast.cam.ac.uk>
Date: Mon, 20 Apr 2009 12:43:37 +0100 (BST)
From: Jeremy Sanders <jss@....cam.ac.uk>
To: Theodore Tso <tytso@....edu>
cc: linux-ext4@...r.kernel.org
Subject: Re: fsck.ext4: Group descriptors look bad... trying backup
blocks...
On Mon, 20 Apr 2009, Theodore Tso wrote:
> On Mon, Apr 20, 2009 at 10:33:09AM +0100, Jeremy Sanders wrote:
>>
>> However, the system seems to mostly work, so I recreated the ext4 device,
>> I've just run my backup script again and fsck'd the device. It seems the
>> problem is reproducible with the new kernel:
>
> When you say reproducible, how many times have you tried it, and were
> you able to reproduce it every single time? 50% of time? I do
> believe there is a problem, but we haven't been able to something
> where it's easily reproducible. So if you can easily reproduce this,
> this is definitely very exciting.
It takes a day or two to do the sync. I've only done it twice (one with
the old kernel, once with the new fedora testing kernel) and it happened
both times. I'm afraid the statistics are rather low number here.
I did a different faster test (just copying my home directory lots of
times), but I wasn't able to get it to fail. That test didn't use much
disk space, however. Maybe it's worth just dd'ing a few TB of data onto
the device and seeing whether that fails.
>> [root@...ck2 ~]# fsck /dev/md0
>> fsck 1.41.4 (27-Jan-2009)
>> e2fsck 1.41.4 (27-Jan-2009)
>> fsck.ext4: Group descriptors look bad... trying backup blocks...
>> Group descriptor 0 checksum is invalid. Fix<y>?
>
> Do you have to reboot to see this, or is it enough to unmount the
> filesystem? How big is the ext4 filesystem, and how big was the
> amount of data that you rsync'ed? One thing that would be worth
> trying if you can easily reproduce is whether it happens on a single
> device disk, or whether it only shows up when you use a /dev/mdX
> device.
I didn't reboot this time - I did last time. I just unmounted the file
system and fsckd it. The filesystem is 8.2TB and the data is around 2.5TB.
The drives on a 3ware card, so I could configure the card as a single
raid5 device and try to reproduce it there. It may take a day or two to
copy the data if I try this.
Jeremy
--
Jeremy Sanders <jss@....cam.ac.uk> http://www-xray.ast.cam.ac.uk/~jss/
X-Ray Group, Institute of Astronomy, University of Cambridge, UK.
Public Key Server PGP Key ID: E1AAE053
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists