linux-kernel - Re: [PATCH 10/10] Dynamic fault injection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180518191040.GG31737@kmo-pixel>
Date:   Fri, 18 May 2018 15:10:40 -0400
From:   Kent Overstreet <kent.overstreet@...il.com>
To:     Andreas Dilger <adilger@...ger.ca>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Chinner <dchinner@...hat.com>, darrick.wong@...cle.com,
        tytso@....edu, linux-btrfs@...r.kernel.org, clm@...com,
        jbacik@...com, viro@...iv.linux.org.uk, willy@...radead.org,
        peterz@...radead.org
Subject: Re: [PATCH 10/10] Dynamic fault injection

On Fri, May 18, 2018 at 01:05:20PM -0600, Andreas Dilger wrote:
> On May 18, 2018, at 1:49 AM, Kent Overstreet <kent.overstreet@...il.com> wrote:
> > 
> > Signed-off-by: Kent Overstreet <kent.overstreet@...il.com>
> 
> I agree with Christoph that even if there was some explanation in the cover
> letter, there should be something at least as good in the patch itself.  The
> cover letter is not saved, but the commit stays around forever, and should
> explain how this should be added to code, and how to use it from userspace.
> 
> 
> That said, I think this is a useful functionality.  We have something similar
> in Lustre (OBD_FAIL_CHECK() and friends) that is necessary for being able to
> test a distributed filesystem, which is just a CPP macro with an unlikely()
> branch, while this looks more sophisticated.  This looks like it has some
> added functionality like having more than one fault enabled at a time.
> If this lands we could likely switch our code over to using this.

This is pretty much what I was looking for, I just wanted to know if this patch
was interesting enough to anyone that I should spend more time on it or just
drop it :) Agreed on documentation. I think it's also worth factoring out the
functionality for the elf section trick that dynamic debug uses too.

> Some things that are missing from this patch that is in our code:
> 
> - in addition to the basic "enabled" and "oneshot" mechanisms, we have:
>   - timeout: sleep for N msec to simulate network/disk/locking delays
>   - race: wait with one thread until a second thread hits matching check
> 
> We also have a "fail_val" that allows making the check conditional (e.g.
> only operation on server "N" should fail, only RPC opcode "N", etc).

Those all sound like good ideas... fail_val especially, I think with that we'd
have all the functionality the existing fault injection framework has (which is
way to heavyweight to actually get used, imo)