linux-ext4 - Re: ext4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120702180150.GB5795@thunk.org>
Date:	Mon, 2 Jul 2012 14:01:50 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Jan Kara <jack@...e.cz>
Cc:	Eric Sandeen <sandeen@...hat.com>, Fredrick <fjohnber@...o.com>,
	Ric Wheeler <rwheeler@...hat.com>, linux-ext4@...r.kernel.org,
	Andreas Dilger <adilger@...ger.ca>, wenqing.lz@...bao.com
Subject: Re: ext4_fallocate

On Mon, Jul 02, 2012 at 07:44:21PM +0200, Jan Kara wrote:
>   Yes, that option is broken and basically unfixable for data=ordered mode
> (see http://comments.gmane.org/gmane.comp.file-systems.ext4/30727). For
> data=writeback it works fine AFAICT.

The journal_async_commit option can be saved, but it requires changing
how we handle stale data.  What we need to do is to change things so
that we update the metadata *after* the data has been written back.
We do this already if the experimental dioread_nolock code is used,
but currently it only works for 4k blocks. 

The I/O tree work will give us the infrastructure we need so we can
easily update the metadata after the data blocks have been written out
when we are extending or filling in a sparse hole, even when the block
size != page size.  (This is why we can't currently make the
dioread_nolock code path the default; it would cause things to break
on 1k/2k file systems, as well as 4k file systems on Power.)  But once
this is done, it will allow us to subsume and rip out dioread_nolock
code[path, and the distinction between ordered and writeback mode.

Also, the metadata checksum patches will fix the other potential
problem with using journal_async_commit, which is that it adds
fine-grained checksums in the journal, so we can recover more easily
from a corrupted journal.

So once all of this is stable, we'll be able significantly simplify
the ext4 code and our testing matrix, and get all of the benefits of
data=writeback, dioread_nolock, and journal_async_commit, without any
of their current drawbacks.  Which is why I've kept on pestering Zheng
about how the I/O tree work has been coming along on the ext4 calls;
it's going to enable some really cool things.  :-)

    	       	     	      		    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html