linux-ext4 - Re: [PATCH 2/2] ext4: Automatically enable journal_async

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4AAA6124.6090509@redhat.com>
Date:	Fri, 11 Sep 2009 10:39:32 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Theodore Tso <tytso@....edu>
CC:	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: Automatically enable journal_async_commit on
 ext4 file systems

On 09/11/2009 09:13 AM, Theodore Tso wrote:
> On Fri, Sep 11, 2009 at 07:07:27AM -0400, Ric Wheeler wrote:
>    
>> I still think that we changing from a situation in which the drive state
>> with regards to our transactions is almost always consistent to one in
>> which we will often not be consistent.
>>
>> More or less, moving from tight control of the persistent state on the
>> platter to a situation in which, after power failure, we will more often
>> see a bad transaction.  The checksum will catch those conditions, but
>> catching and repairing is not the same as avoiding the need to repair in
>> the first place :)
>>      
> We won't need to repair anything.  We still have a barrier before we
> allow the filesystem to proceed with writing back buffers or
> allocating blocks that aren't safe to be be written back or allocated
> until after the commit.
>
> So if the checksum doesn't match, we simply discard the last commit,
> and the filesystem will be in a consistent state.  This case is
> analogous to what happens if we didn't have enough time to write the
> journal blocks plus the commit blocks before the crash.  By removing
> the barrier before the commit block, it's possible for the commit
> block to be written before the rest of the journal blocks, but we can
> treat this case the same way that we treat a missing commit block ---
> we simply throw away the last transaction.
>
>
> The problems that I've worried about in the past is what happens if we
> have a checksum failure on some commit block *other* than the last
> commit block in the journal.  In that case, we *will* need to do a
> full file system check and repair, and it is a toss up whether we are
> better off ignoring the checksum failure, and replaying all of the
> journal transaction, and hope that the checksum failure is caused by a
> corrupted data block that will be later overwritten by a later
> transaction, or whether we abort the journal replay immediately and
> not replay the later transactions.  Currently we do the latter, but
> the problem is that since we have already started reusing blocks that
> might have been deleted in previous transactions, and some of the
> buffes pinned by previous transactions have already been written out,
> the file system will be in trouble.  This is where adding per-block
> checksums into the journal descriptor blocks might allow us to do a
> better job of recovering from failures in the journal.
>
> *However*, this is problem is totally orthogonal to the async commit.
> In the case of the last transaction, where some journal blocks were
> written out before the commit block was written out, it is safe to
> throw away the last transaction and consider it simply a "not
> committed transaction".
>
>    
>> The key is really how can we measure the impact of this in a realistic
>> way. How many fsck's are needed after a power fail? Chris's directory
>> corruption test?
>>      
> So the test should be that there should be *zero* file system
> corruptions caused by a power failure.  (Unless the power fail induces
> a hardware error, of course; if the stress caused by the power drop
> causes a head crash, nothing we can do about that in software!)  The
> async commit patch should be that safe.  If we can confirm that, then
> the case for making it be the default mount option should be a
> no-brainer.
>
>         	      	     	     	       - Ted
>    

The above makes sense to me. Now we just need to figure out how to test 
properly and verify :-(

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html