linux-ext4 - Re: Question about BIGALLOC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20110811025944.GE3625@thunk.org>
Date:	Wed, 10 Aug 2011 22:59:44 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Robin Dong <hao.bigrat@...il.com>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: Question about BIGALLOC

On Fri, Aug 05, 2011 at 03:59:01PM +0800, Robin Dong wrote:
> Hi, Ted
> 
> I am doing some test of BIGALLOC using "next" branch of e2fsprogs and
> kernel (3.0) with 23 bigalloc-patches.
> 
> Everything seems to work, but I have a question:
> the "ee_len" of "struct ext4_extent" is used to indicate block numbers
> not cluster,
> an ext4_extent in 4K-block-size filesystem can only hold 128MB space at most
> even with BIGALLOC feature enabled, so we don't have any benefit from
> this for a file with large number of blocks.
> 
> Is this the design behavior or you will change it in the next version?

It's not something I can change, because the VM subsystem
fundamentally assumes that file system block size is less than or
equal to the page size.  If I changed the granularity in the extent
length, then in the case of a sparse file, blocks would have to be
allocated and zero'ed in units of a cluster.  The VM doesn't support
this well.

For example, suppose you fallocate a file to be 1 megabyte.  That
means that you have a 1mb extent which is marked as uninitialized.
Now suppose you mmap() this file, and then you write a single byte at
offset 20480.  This dirties a single 4k page, and when we write out
that page, we end up converting the 1mb extent uninitialized extent
into 3 extents: a 20k uninitalized extent, a 4k initialized extent,
and a 1000k uninitalized extent.  Now suppose this was done on a
bigalloc file system with a 64k cluster size.  If ee_len was
denominated in 64k cluster chunks, we couldn't express the concept of
a 20k or 4k extent.

This is the same reason why we can't support a 64k file system block
size.  If the user dirties a single 4k block in an otherwise sparse
file, the VM would have to instantiate the other 56k pages and zero
them (atomically!) --- and the VM doesn't know how to do this.

     		       	       	  	  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html