lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6617927D-7C9C-4D02-97FD-C9CC75609448@dilger.ca>
Date:	Thu, 31 Mar 2011 12:21:46 -1000
From:	Andreas Dilger <adilger@...ger.ca>
To:	Eric Sandeen <sandeen@...hat.com>
Cc:	Daniel Taylor <Daniel.Taylor@....com>, linux-ext4@...r.kernel.org
Subject: Re: breaking ext4 to test recovery

On 2011-03-29, at 3:50 AM, Eric Sandeen wrote:
> On 3/28/11 9:45 PM, Daniel Taylor wrote:
>> I would like to be able to break our ext4 file system
>> (specifically corrupt the journal) to be sure that we
>> can automatically notice the problem and attempt an
>> autonomous fix.
>> 
>> dumpe2fs tells me the inode, but not, that I can see, the
>> blocks where the journal exists (for "dd"ing junk to it).
>> 
>> Is there any debug tool that would let me deliberately
>> break the file system (at least, trash the journal)?
>> 
>> If not, is there a hint for figuring out the block(s) of
>> the journal so I can stomp it?
>> 
>> The kernel is in an embedded machine, so it's a little old
>> 2.6.32.11 and e2fsprogs/libs 1.41.12-2 (Lenny)
> 
> But are you trying to test in-kernel recovery, or e2fsck, after
> you corrupt the journal?  Or both?
> 
> I assume you'd start with a filesystem with a dirty log,
> corrupt that log, and then what, fsck it, or try to mount it?
> 
> How are you generating your fs w/ dirty log?
> 
> (xfs has an ioctl to abruptly "stop" the fs as if it had crashed,
> that would be very useful in extN as well).

We have a kernel patch "dev_read_only" that we use with Lustre to disable writes to the block device while the device is in use.  This allows simulating crashes at arbitrary points in the code or test scripts.  It was based on Andrew Morton's test harness that he used for ext3 recovery testing back when it was being ported to the 2.4 kernel.

http://git.whamcloud.com/?p=fs/lustre-release.git;a=blob_plain;f=lustre/kernel_patches/patches/dev_read_only-2.6.32-rhel6.patch;hb=HEAD

The best part of this patch is that it works with any block device, can simulate power failure w/o any need for automated power control, and once the block device is unused (all buffers and references dropped) it can be re-activated safely.

> Another thing which could use lots more testing in the wild is
> simple journal recovery; nothing is corrupted, but the drive got
> unplugged or the system lost power while the fs was under load;
> see if a mount; umount; fsck and/or if a fsck; mount; umount; fsck finds
> errors.
> 
> (the former will test in-kernel log recovery, the latter will test
> log recovery in e2fsck).

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ