linux-ext4 - Re: [PATCH]ext4: online defrag: Enable to reuse blocks by multiple defrag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <493F7732.1020505@rs.jp.nec.com>
Date:	Wed, 10 Dec 2008 17:00:50 +0900
From:	Akira Fujita <a-fujita@...jp.nec.com>
To:	Theodore Tso <tytso@....edu>
CC:	linux-ext4@...r.kernel.org
Subject: Re: [PATCH]ext4: online defrag: Enable to reuse blocks by multiple
 defrag

Hi Ted,

Thank you for letting me know.
I think new defrag can be implemented with your proposal.
At first, I am planning to implement usual defrag (without any options)
in the following steps.
Please check whether my approach is fine.

(U:User spcace K:Kernel)
1:U  Create donor inode and then unlink it.

2:U  Allocate contiguous blocks to donor inode with fallocate().

3:U  Call the FS_IOC_FIEMAP ioctl to get the extents information of donor inode.
      And check the extents of donor inode are less than the defrag target inode's.

4:U  Call the EXT4_IOC_DEFRAG ioctl to exchange the data between
      target inode and donor inode.

5:K  The EXT4_IOC_DEFRAG ioctl calls ext4_defrag() in kernel
      (I'm going to change current ext4_defrag() to do only data exchange).

* Step 4 and 5 correspond to Ted's (3) ioctl.

6:U  Close fd of donor inode.


New EXT4_IOC_DEFRAG would be implemented as followings.

#define EXT4_IOC_DEFRAG                 _IOW('f', 15, struct move_extent)

struct move_extent
{
     int org_fd;		/* file descriptor of defrag target file */
     int dest_fd;	/* file descriptor of donor file */
     long long start;	/* logical block offset of target file */
     long long len;	/* exchange data length in block */
}


Also defrag -r and -f options can be implemented with (1) and (2)
in your previous post.  I will address them after implementing usual defrag.


Regards,
Akira Fujita

Theodore Tso wrote:
> On Tue, Dec 09, 2008 at 11:26:37AM +0900, Akira Fujita wrote:
>> I'm redesigning ext4 online defrag based on the comments from Ted.
>> Probably defrag's block allocation method will be changed greatly.
> 
> Akira-san,
> 
> FYI, there was a discussion about defrag on today's ext4 call.  One of
> the ideas that was kicked around was to completely change the
> primitives used by defrag, and to design things around three
> primitive, general purpose interfaces.
> 
> We didn't go into complete detail on the call, but let me give you a
> strawman proposal for consideration/discussion:
> 
> (1) An (ioctl-based) interface which allows a privileged program to
> specify one or more range of blocks which the filesystem's block
> allocator must NOT allocate from.  (We may want to have a flag for
> each block range which either makes the block lockout advisory, such
> that if the block allocator can't find blocks anywhere else, it may
> invade the reserved block area --- or mandatory, where if there are no
> other blocks, the filesystem returns ENOSPC).  This allows the
> defragmenter to work on an area of the disk without worrying about
> concurrent allocations by other processes from getting in the way.
> 
> (2) An (ioctl-based) interface which associates with an inode
> preferred range(s) of blocks which the block allocator will try using
> first; if those blocks are not available, or the block range(s) is
> exhausted, the block allocator use its normal algorithms to pick the
> best available block.  The set of preferred blocks is only guaranteed
> to persist while the inode is in memory.
> 
> (3) An (ioctl-based) interface which takes two inode numbers, and
> allows a privileged program to "defrag" an inode by using blocks from
> a donor inode and using them as the new blocks for the destination
> inode, preserving the contents of the destination inode.
> 
> The advantage of this implementation strategy is that each of the
> interfaces can be implemented one at a time, with very well defined
> semantics, and which can be independently tested.  The semantics can
> also be used in different combinations to solve alternate problems.
> For example, a combination of (1) and (2) can be used to reserve
> blocks for use by a directory that is expected to grow, so the
> directory can use contiguous blocks.  Or, they could be used to
> implement an "online shrink" that would allow a filesystem to be
> resized to a smaller size.
> 
> One other thing that comes to mind.  If it turns out that these
> interfaces have multiple users, and in some cases the reservations or
> block allocation restrictions are expected to last for longer than a
> process lifetime, it may be useful to tag them with a short (8-16
> character) name, so that it is possible to list the current set of
> reservations, and so they can be removed by a privileged user.  This
> could be overdesigning the interface; but the whole *point* of
> thinking about the interfaces from a more generic point of view (as
> opposed for use by a specific program for which the kernel interfaces
> are custom-designed) is that hopefully they will have multiple use
> cases and multiple users, in which case we need to worry about how
> multiple users can co-exist.
> 
> Thoughts, comments?
> 
> 						- Ted
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html