linux-ext4 - Re: [RFC PATCH 1/1] add a jbd option to force an unclean journal state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20080304155801.6f48bf08.akpm@linux-foundation.org>
Date:	Tue, 4 Mar 2008 15:58:01 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Jan Kara <jack@...e.cz>
Cc:	jbacik@...hat.com, linux-ext4@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] add a jbd option to force an unclean journal
 state

On Tue, 4 Mar 2008 20:01:09 +0100
Jan Kara <jack@...e.cz> wrote:

>   Hi,
> 
> On Tue 04-03-08 13:39:41, Josef Bacik wrote:
> > jbd and I want a way to verify that I'm not screwing anything up in the 
> > process, and this is what I came up with.  Basically this option would only be 
> > used in the case where someone mounts an ext3 image or fs, does a specific IO 
> > operation (create 100 files, write data to a few files etc), unmounts the fs 
> > and remounts so that jbd does its journal recovery and then check the status of 
> > the fs to make sure its exactly the way its expected to be.  I'm not entirely 
> > sure how usefull of an option like this would be (or if I did it right :) ), 
> > but I thought I'd throw it out there in case anybody thinks it may be useful, 
> > and in case there is some case that I'm missing so I can fix it and better make 
> > sure I don't mess anything up while doing stuff.  Basically this patch keeps us 
> > from resetting the journal's tail/transaction sequence when we destroy the 
> > journal so when we mount the fs again it will look like we didn't unmount 
> > properly and recovery will occur.  Any comments are much appreciated,
>   Actually, there is a different way how we've done checking like this (and
> I think also more useful), at least for ext3. Basically you mounted a
> filesysteem with some timeout and after the timeout, device was forced
> read-only. And then you've checked that the fs is consistent after journal
> replay. I think Andrew had the patches somewhere...

About a billion years ago...

But the idea was (I think) good:

- mount the filesystem with `-o ro_after=100'

- the fs arms a timer to go off in 100 seconds

- now you start running some filesystem stress test

- the timer goes off.  At timer-interrupt time, flags are set which cause
  the low-level driver layer to start silently ignoring all writes to the
  device which backs the filesystem.

  This simulates a crash or poweroff.

- Now up in userspace we

  - kill off the stresstest
  - unmount the fs
  - mount the fs (to run recovery)
  - unmount the fs
  - fsck it
  - mount the fs
    - check the data content of the files which the stresstest was writing:
      look for uninitialised blocks, incorrect data, etc.
  - unmount the fs

- start it all again.


So it's 100% scriptable and can be left running overnight, etc.  It found
quite a few problems with ext3/jbd recovery which I doubt could be found by
other means.  This was 6-7 years ago and I'd expect that new recovery bugs
have crept in since then which it can expose.

I think we should implement this in a formal, mergeable fashion, as there
are numerous filesystems which could and should use this sort of testing
infrastructure.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html