linux-ext4 - Re: [RFC PATCH] fstests: Check if a fs can survive random (emulated) power loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxhsvGVH19HZE6b1exnFt3pB6qwLr=PUkjpXeq12F9sNJQ@mail.gmail.com>
Date:   Mon, 26 Feb 2018 10:15:57 +0200
From:   Amir Goldstein <amir73il@...il.com>
To:     Qu Wenruo <wqu@...e.com>
Cc:     fstests <fstests@...r.kernel.org>,
        Linux Btrfs <linux-btrfs@...r.kernel.org>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        Ext4 <linux-ext4@...r.kernel.org>,
        Josef Bacik <josef@...icpanda.com>
Subject: Re: [RFC PATCH] fstests: Check if a fs can survive random (emulated)
 power loss

On Mon, Feb 26, 2018 at 9:31 AM, Qu Wenruo <wqu@...e.com> wrote:
> This test case is originally designed to expose unexpected corruption
> for btrfs, where there are several reports about btrfs serious metadata
> corruption after power loss.
>
> The test case itself will trigger heavy fsstress for the fs, and use
> dm-flakey to emulate power loss by dropping all later writes.
>

Come on... dm-flakey is so 2016
You should take Josef's fsstress+log-writes test and bring it to fstests:
https://github.com/josefbacik/log-writes

By doing that you will gain two very important features from the test:

1. Problems will be discovered much faster, because the test can run fsck
    after every single block write has been replayed instead of just at random
    times like in your test

2. Absolute guaranty to reproducing the problem by replaying the write log.
    Even though your fsstress could use a pre-defined random seed to results
    will be far from reproduciable, because of process and IO scheduling
    differences between subsequent test runs.
    When you catch an inconsistency with log-writes test, you can send the
    write-log recording to the maintainer to analyze the problem, even if it is
    a hard problem to hit. I used that useful technique for ext4,btrfs,xfs when
    ran tests with generic/455 and found problems.

Cheers,
Amir.