[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D925E27.6010309@redhat.com>
Date: Tue, 29 Mar 2011 17:33:11 -0500
From: Eric Sandeen <sandeen@...hat.com>
To: Daniel Taylor <Daniel.Taylor@....com>
CC: linux-ext4@...r.kernel.org
Subject: Re: breaking ext4 to test recovery
On 3/29/11 5:26 PM, Daniel Taylor wrote:
> Thanks for the suggestions. Tao Ma's got me started, but doing some
> of the more "devious" tests is on my list, too.
>
> The original issue was that during component stress testing, we were
> seeing instances of the ext4 file system becoming "read-only" (showing
> in /proc/mounts, but not "mount"). Looking back through the logs, we
> saw that at mount time, there was a complaint about a corrupted journal.
So, did it go "read-only" right at mount time due to a journal replay
failure? Or ...
> Some writing had occurred before the change to read-only, however.
That makes it sound like it did get mounted ok... and then something
went wrong? What did the logs say?
> The original mount script didn't check for any "mount" return value, so
> we theorized that ext4 just got to a point where it couldn't sensibly
> handle any more changes.
I'm not sure what that means, TBH :)
Just want to make sure you're barking up the right tree, here ...
-Eric
> It seemed that the right answer was to check the return value from mount
> and, if non-0, umount the file system, fix it, and try again. To test
> the return value from mount, I need to be able to corrupt, but not
> destroy the journal, since the component tests were taking days to show
> the failure.
>
> Running an "fsck -f" every time on a 3TB file system with an embedded
> PPC was just taking too much time to impose on a consumer-level customer.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists