linux-ext4 - Re: CrashMonkey: A Framework to Systematically Test File-System Crash Consistency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 16 Aug 2017 23:27:20 +0300
From:   Amir Goldstein <amir73il@...il.com>
To:     Vijay Chidambaram <vvijay03@...il.com>
Cc:     Josef Bacik <josef@...icpanda.com>,
        Ext4 <linux-ext4@...r.kernel.org>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux Btrfs <linux-btrfs@...r.kernel.org>,
        Ashlie Martinez <ashmrtn@...xas.edu>, kernel-team@...com
Subject: Re: CrashMonkey: A Framework to Systematically Test File-System Crash Consistency

On Wed, Aug 16, 2017 at 10:06 PM, Vijay Chidambaram <vvijay03@...il.com> wrote:
> Hi Josef,
>
> Thank you for the detailed reply -- I think it provides several
> pointers for our future work. It sounds like we have a similar vision
> for where we want this to go, though we may disagree about how to
> implement this :) This is exciting!
>
> I agree that we should be building off existing work if it is a good
> option. We might end up using log-writes, but for now we see several
> problems:
>
> - The log-writes code is not documented well. As you have mentioned,
> at this point, only you know how it works, and we are not seeing a lot
> of adoption by other developers of log-writes as well.
>
> - I don't think our requirements exactly match what log-writes
> provides. For example, at some point we want to introduce checkpoints
> so that we can co-relate a crash state with file-system state at the
> time of crash. We also want to add functionality to guide creation of
> random crash states (see below). This might require changing
> log-writes significantly. I don't know if that would be a good idea.
>
> Regarding random crashes, there is a lot of complexity there that
> log-writes couldn't handle without significant changes. For example,
> just randomly generating crash states and testing each state is
> unlikely to catch bugs. We need a more nuanced way of doing this. We
> plan to add a lot of functionality to CrashMonkey to (a) let the user
> guide crash-state generation (b) focus on "interesting" states (by
> re-ordering or dropping metadata). All of this will likely require
> adding more sophistication to the kernel module. I don't think we want
> to take log-writes and add a lot of extra functionality.
>
> Regarding logging writes, I think there is a difference in approach
> between log-writes and CrashMonkey. We don't really care about the
> completion order since the device may anyway re-order the writes after
> that point. Thus, the set of crash states generated by CrashMonkey is
> bound only by FUA and FLUSH flags. It sounds as if log-writes focuses
> on a more restricted set of crash states.
>
> CrashMonkey works with the 4.4 kernel, and we will try and keep up
> with changes to the kernel that breaks CrashMonkey. CrashMonkey is
> useless without the user-space component, so users will be needing to
> compile some code anyway. I do not believe it will matter much whether
> it is in-tree or not, as long as it compiles with the latest kernel.
>
> Regarding discard, multi-device support, and application-level crash
> consistency, this is on our road-map too! Our current priority is to
> build enough scaffolding to reproduce a known crash-consistency bug
> (such as the delayed allocation bug of ext4), and then go on and try
> to find new bugs in newer file systems like btrfs.
>
> Adding CrashMonkey into the kernel is not a priority at this point (I
> don't think CrashMonkey is useful enough at this point to do so). When
> CrashMonkey becomes useful enough to do so, we will perhaps add the
> device_wrapper as a DM target to enable adoption.
>
> Our hope currently is that developers like Ari will try out
> CrashMonkey in its current form, which will guide us as to what
> functionality to add to CrashMonkey to find bugs more effectively.
>

Vijay,

I can only speak for myself, but I think I represent other filesystem
developers with this response:
- Often with competing projects the end
results is always for the best when project members cooperate to combine
the best of both projects.
- Some of your project goals (e.g. user guided crash states) sound very
intriguing
- IMO you are severely underestimating the pros in mainlined
kernel code for other developers. If you find the dm-log-writes target
is lacking functionality it would be MUCH better if you work to improve it.
Even more - it would be far better if you make sure that your userspace
tools can work also with the reduced functionality in mainline kernel.
- If you choose to complete your academic research before crossing over
to existing code base, that is a reasonable choice for you to make, but
the reasonable choice for me to make is to try Joseph's tools from his
repo (even if not documented) and *only* if it doesn't meet my needs
I would make the extra effort to try out  CrashMonkey.
- AFAIK the state of filesystem crash consistency testing tools is so bright
(maybe except in Facebook ;) , so my priority is to get *some* automated
testing tools in motion

In any case, I'm glad this discussion started and I hope it would expedite
the adoption of crash testing tools.
I wish you all the best with your project.

Amir.