lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Oct 2006 02:01:28 +1000
From:	David Chinner <dgc@....com>
To:	Dave Kleikamp <shaggy@...tin.ibm.com>
Cc:	David Chinner <dgc@....com>, Jeff Garzik <jeff@...zik.org>,
	Alex Tomas <alex@...sterfs.com>, Theodore Tso <tytso@....edu>,
	Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
	linux-ext4@...r.kernel.org
Subject: Re: [RFC] Ext3 online defrag

On Tue, Oct 24, 2006 at 09:51:41AM -0500, Dave Kleikamp wrote:
> On Tue, 2006-10-24 at 23:59 +1000, David Chinner wrote:
> > On Tue, Oct 24, 2006 at 12:14:33AM -0400, Jeff Garzik wrote:
> > > On Mon, Oct 23, 2006 at 06:31:40PM +0400, Alex Tomas wrote:
> > > > isn't that a kernel responsbility to find/allocate target blocks?
> > > > wouldn't it better to specify desirable target group and minimal
> > > > acceptable chunk of free blocks?
> > > 
> > > The kernel doesn't have enough knowledge to know whether or not the
> > > defragger prefers one blkdev location over another.
> > > 
> > > When you are trying to consolidate blocks, you must specify the
> > > destination as well as source blocks.
> > > 
> > > Certainly, to prevent corruption and other nastiness, you must fail if
> > > the destination isn't available...
> > 
> > That's the wrong way to look at it. if you want the userspace
> > process to specify a location, then you should preallocate it first
> > before doing anything else. There is no need to clutter a simple
> > data mover interface with all sorts of unnecessary error handling.
> 
> You are implying the the 2-step interface, creating a new inode then
> swapping the contents, is the only way to implement this.

No, it's not the only way to implement it, but it seems the cleanest
way to me when you have to consider crash recovery. With a temporary
inode, you can create it, hold a reference and then unlink it so
that any crash at that point will free the inode and any extents
it has on it.

The only way I can see anything different working is having the
filesystem hold extents somewhere internally that provides us the
same recovery guarantees while we copy the data and insert the new
extents.  This is obviously a filesystem specific solution and is
more complex to implement than a swap extent transaction. it
probably also needs on disk format changes to support properly....

> > Once you've separated the destination allocation from the data
> > mover, the mover is basically a splice copy from source to
> > destination, an fsync and then an atomic swap blocks/extents operation.
> > Most of this code is generic, and a per-fs swap-extents vector
> > could be easily provided for the one bit that is not....
> 
> The benefit of having such a simple data mover is negated by moving the
> complexity into the allocator.

What complexity does it introduce that the allocator doesn't already
have or needs to provide for the single call interface to work?

> A single interface that would move a part of a file at a time has the
> advantage that a large file which is only fragmented in a few areas does
> not need to be completely moved.

And the two-step process can do exactly this as well - splice can
work on any offset within the file...

> > The allocation interface, OTOH, is anything but simple and is really
> > a filesystem specific interface. Seems logical to me to separate
> > the two. 
> 
> So what then is the benefit of having a simple generic data mover if
> every file system needs to implement it's own interface to allocate a
> copy of the data?

I assume you meant "....allocate the space to store the copy of the data."

The allocation interface needs to be be able to be  extended
independently of the data mover interface. XFS already exposes
allocation ioctls to userspace for preallocation and we've got plans
to extnd this further to allow userspace controlled allocation for
smart defrag tools for XFS. Tying allocation to the data mover
just makes the interface less flexible and harder to do anything
smart with....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ