lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170815173349.GA17774@li70-116.members.linode.com>
Date:   Tue, 15 Aug 2017 17:33:50 +0000
From:   Josef Bacik <josef@...icpanda.com>
To:     Vijay Chidambaram <vvijay03@...il.com>
Cc:     linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-btrfs@...r.kernel.og,
        vijay@...utexas.edu, Ashlie Martinez <ashmrtn@...xas.edu>
Subject: Re: CrashMonkey: A Framework to Systematically Test File-System
 Crash Consistency

On Mon, Aug 14, 2017 at 11:32:02AM -0500, Vijay Chidambaram wrote:
> Hi,
> 
> I'm Vijay Chidambaram, an Assistant Professor at the University of
> Texas at Austin. My research group is developing CrashMonkey, a
> file-system agnostic framework to test file-system crash consistency
> on power failures. We are developing CrashMonkey publicly at Github
> [1]. This is very much a work-in-progress, so we welcome feedback.
> 
> CrashMonkey works by recording all the IO from running a given
> workload, then *constructing* possible crash states (while honoring
> FUA and FLUSH flags). A crash state is the state of storage after an
> abrupt power failure or crash. For each crash state, CrashMonkey runs
> the filesystem-provided fsck on top of the state, and checks if the
> file-system recovers correctly. Once the file system mounts correctly,
> we can run further tests to check data consistency.  The work was
> presented at HotStorage 17. The workshop paper is available at [2] and
> the slides at [3].
> 
> Our plan was to post on the mailing lists after reproducing an
> existing bug. We are not there yet, but I saw some posts where others
> were considering building something similar, so I thought I would post
> about our work.
> 
> [1] https://github.com/utsaslab/crashmonkey
> [2] http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey.pdf
> [3] http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey-slides.pdf
> 

I did this same work 3 years ago

https://github.com/torvalds/linux/blob/master/Documentation/device-mapper/log-writes.txt
https://github.com/josefbacik/log-writes

I have xfstests patches I need to get upstreamed at some point that does
fsstress and then replays the logs and verifies, and also one that makes fsx
store state so we can verify fsync() is doing the right thing.  We run this on
our major releases on xfs, ext4, and btrfs to make sure everything is working
right internally at Facebook.  You'll notice a bunch of commits recently because
we thought we found an xfs replay problem (we didn't).  This stuff is actively
used, I'd welcome contributions to it if you have anything to add.  One thing I
haven't done yet and have on my list is to randomly replay writes between
flush/fua, but it hasn't been a pressing priority yet.  Thanks,

Josef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ