[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070507233622.GB29907@thunk.org>
Date: Mon, 7 May 2007 19:36:22 -0400
From: Theodore Tso <tytso@....edu>
To: Jeff Garzik <jeff@...zik.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Amit K. Arora" <aarora@...ux.vnet.ibm.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-ext4@...r.kernel.org, xfs@....sgi.com, suparna@...ibm.com,
cmm@...ibm.com
Subject: Re: [PATCH 4/5] ext4: fallocate support in ext4
On Mon, May 07, 2007 at 07:02:32PM -0400, Jeff Garzik wrote:
> Andreas Dilger wrote:
> >On May 07, 2007 13:58 -0700, Andrew Morton wrote:
> >>Final point: it's fairly disappointing that the present implementation is
> >>ext4-only, and extent-only. I do think we should be aiming at an ext4
> >>bitmap-based implementation and an ext3 implementation.
> >
> >Actually, this is a non-issue. The reason that it is handled for
> >extent-only
> >is that this is the only way to allocate space in the filesystem without
> >doing the explicit zeroing. For other filesystems (including ext3 and
>
> Precisely /how/ do you avoid the zeroing issue, for extents?
>
> If I posix_fallocate() 20GB on ext4, it damn well better be zeroed,
> otherwise the implementation is broken.
There is a bit in the extent structure which indicates that the extent
has not been initialized. When reading from a block where the extent
is marked as unitialized, ext4 returns zero's, to avoid returning the
uninitalized contents of the disk, which might contain someone else's
love letters, p0rn, or other information which we shouldn't leak out.
When writing to an extent which is uninitalized, we may potentially
have to split the extent into three extents in the worst case.
My understanding is that XFS uses a similar implementation; it's a
pretty obvious and standard way to implement allocated-but-not-initialized
extents.
We thought about supporting persistent preallocation for inodes using
indirect blocks, but it would require stealing a bit from each entry
in the indirect block, reducing the maximum size of the filesystem by
two (i.e., 2**31 blocks). It was decided it wasn't worth the
complexity, given the tradeoffs.
- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists