[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061025020927.GS8394166@melbourne.sgi.com>
Date: Wed, 25 Oct 2006 12:09:27 +1000
From: David Chinner <dgc@....com>
To: Theodore Tso <tytso@....edu>
Cc: David Chinner <dgc@....com>, Jeff Garzik <jeff@...zik.org>,
Alex Tomas <alex@...sterfs.com>, Jan Kara <jack@...e.cz>,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [RFC] Ext3 online defrag
On Tue, Oct 24, 2006 at 03:44:16PM -0400, Theodore Tso wrote:
> On Tue, Oct 24, 2006 at 11:59:28PM +1000, David Chinner wrote:
> > That's the wrong way to look at it. if you want the userspace
> > process to specify a location, then you should preallocate it first
> > before doing anything else. There is no need to clutter a simple
> > data mover interface with all sorts of unnecessary error handling.
>
> This is doable, but it adds a huge amount of complexity before we
> could implement on-line defragmentation.
>
> First of all, we would need a way of allowing userpsace to specify
> which blocks should be used in the preallocation.
Not initially. Create a file, and call posix_fallocate() on it.
Later, the filesystem can provide something that the defrag tool can
use for fine-grained control of where the preallocated blocks are on
disk.
> Secondly, we would need a way of marking blocks as "preallocated but
> not pre-zeroed"; otherwise we would have to zero out all of the blocks
> in order to assure security (don't want userspace programs seeing the
> previous contents of the data blocks), only to do the copy and the
> extents vector swap.
The unlinked inode method avoids this problem because no user space
process can see the inode to open it. Also, posix_fallocate() zeroes
the disk blocks so even this protects against data exposure.
So, now all that remains for an initial implementation is the swap
extents transaction and the data mover syscall.
For a smart, fast implementation, I agree that you need unwritten
extents (which XFS already has), then a fast filesystem
implementation of posix_fallocate() that utilises unwritten extents
(which XFS already has), and finally another interface that allows
you to allocate unwritten extents in an arbitrary location within
the filesystem (which no filesystem currently has).
> That's a huge amount of work, and while the above two features can be
> useful for other things, it's not clear it's worth it to require this
> as the only way to implement on-line defragging. You're right that
> it's a way of making things be more generic, but it means that each
> filesystem needs to have a huge amount of additional complexity and
> potential filesystem format changes before they could take advantage
> of this general framework.
I disagree - it's not a huge amount of work to get some thing
working and to solidify the generic interfaces and only format
change is a new transaction. Any filesystem that supports the swap
extent/blocks method would then work better than XFs's current
online defrag tool which currently does not use preallocation,
nor does it use splice.....
> (For example, you'd never be able to do this with the FAT filesystem,
> or the ext2 or ext3 filesystems; it would work for ext4 only *after*
> we implement the above mentioned new features and the associated
> filesystem format changes.)
Sure, but they can use the slow, unoptimised posix_fallocate() method
for allocating disk space....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists