linux-ext4 - Re: breaking ext4 to test recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4D925E27.6010309@redhat.com>
Date:	Tue, 29 Mar 2011 17:33:11 -0500
From:	Eric Sandeen <sandeen@...hat.com>
To:	Daniel Taylor <Daniel.Taylor@....com>
CC:	linux-ext4@...r.kernel.org
Subject: Re: breaking ext4 to test recovery

On 3/29/11 5:26 PM, Daniel Taylor wrote:
> Thanks for the suggestions.  Tao Ma's got me started, but doing some
> of the more "devious" tests is on my list, too.
> 
> The original issue was that during component stress testing, we were
> seeing instances of the ext4 file system becoming "read-only" (showing
> in /proc/mounts, but not "mount").  Looking back through the logs, we
> saw that at mount time, there was a complaint about a corrupted journal.

So, did it go "read-only" right at mount time due to a journal replay
failure? Or ...

> Some writing had occurred before the change to read-only, however.

That makes it sound like it did get mounted ok... and then something
went wrong?  What did the logs say?
 
> The original mount script didn't check for any "mount" return value, so
> we theorized that ext4 just got to a point where it couldn't sensibly
> handle any more changes.

I'm not sure what that means, TBH :)

Just want to make sure you're barking up the right tree, here ...

-Eric

> It seemed that the right answer was to check the return value from mount
> and, if non-0, umount the file system, fix it, and try again.  To test
> the return value from mount, I need to be able to corrupt, but not
> destroy the journal, since the component tests were taking days to show
> the failure.
> 
> Running an "fsck -f" every time on a 3TB file system with an embedded
> PPC was just taking too much time to impose on a consumer-level customer.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html