lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <49AE2F1E.205@rs.jp.nec.com>
Date:	Wed, 04 Mar 2009 16:34:54 +0900
From:	Akira Fujita <a-fujita@...jp.nec.com>
To:	Theodore Tso <tytso@....edu>
CC:	linux-ext4@...r.kernel.org
Subject: [RFC] mballoc: Add ioctls for new block allocation policy

Hi Ted,

We will reconsider the implementation of force defragmentation mode (-f mode)
and relevant file defragmentation mode (-r mode), as you suggested in the
mail in December 2008:
 http://marc.info/?l=linux-ext4&m=122880166227883&w=2

These modes need to add the two following functions into the ext4 multiblock
allocation.
We'd like to decide the interface for the functions, so any comments are
welcome.

  a. Block allocation restriction
     This is the ioctl interface which allows a privileged program to specify
     one or more range of blocks which the filesystem's block allocator
     must not allocate from.
     This allows the ext4 online defrag to solve free space fragmentation;
     it has to do with force defragmentation mode.
     This feature may be useful for online shrink; at first, we restrict the
     allocation from the tail of a filesystem, then move data away from there,
     and shorten the size of it.

  b. Preferred blocks allocation
     This is ioctl interface which associates an inode with preferred range of
     blocks which the block allocator will try using first.
     It gives the two following features to ext4 online defrag.
      1. Defragment files and re-allocate them closely each other
         (Relevant file defragmentation mode needs this one).
      2. After solving free space fragmentation, re-allocate a file to the
         contiguous free space (Force defragmentation mode needs this one).
     It is also possible to allocate particular blocks to a file with
     fallocate in advance.

The followings are the implementation approaches of above two functions.

  a. Block allocation restriction (balloc restriction)
     For balloc restriction, we need to add ioctls, structures, and a member
     to an existing structure.

     Ioctls:
       EXT4_IOC_ADD_GLOBAL_ALLOC_RULE
				_IOW('f', 16, struct ext4_alloc_rule);

       This ioctl forbids block allocation from the blocks range where
       pointed by the ext4_alloc_rule. struct ext4_alloc_rule is set to
       ext4_sb_info->s_bg_list (described later).
       When we set it, the filesystem relative blocks range is converted into
       the block group relative one.
       The set ext4_alloc_rule is removed by the following ioctl or
       unmounting filesystem.

       EXT4_IOC_CLR_GLOBAL_ALLOC_RULE
				_IOW('f', 17, struct ext4_alloc_rule);

       This ioctl permits block allocator to allocate the range of blocks
       pointed by ext4_alloc_rule.
       It modifies s_bg_list->range_list to make the range allocatable.

     Structures:
       * ext4_alloc_rule (describes the range of balloc restriction)
         struct ext4_alloc_rule {
		__u64 start;	  /* physical start offset in block */
		__u64 len;	  /* the length of the blocks range */
		__u32 alloc_flag; /* mandatory...0(default) advisory...1 */
         };

         "alloc_flag" defines the behavior when the block allocator can not
         get blocks in the range of balloc restriction.
         In "mandatory" case, we never get the blocks from "start" to
         "start + len". If block allocation fail by the restriction, we get
         error (ENOSPC).
         On the other hand, we may get the blocks from the restricted range in
         "advisory" case.

       * ext4_bg_list (the list of the bg relative range of balloc restriction)
         struct ext4_bg_list {
		struct list_head bg_list;     /* next ext4_bg_list */
		ext4_group_t	 bg_num;      /* bg num */
		ext4_grpblk_t	 used_blocks; /* forbidden blocks by the
						 restriction */
		struct list_head range_list;  /* The list of bg relative balloc
						 restriction */
         };

         This list manages the bg relative range of balloc restrictions (struct
         ext4_bg_alloc_rule).

       * ext4_bg_alloc_rule (the bg relative range of balloc restriction)
         struct ext4_bg_alloc_rule {
		struct list_head range_list; /* next ext4_bg_alloc_rule */
		ext4_grpblk_t	 start;      /* physical start offset
						in block */
		ext4_grpblk_t	 end;        /* physical last offset
						in block */
		int		 alloc_flag; /* mandatory...0(default)
						advisory...1 */
         };

         This structure stores the bg relative range of balloc restriction.
         The range passed by ioctl is filesystem relative one, so it needs to
         be converted into it.

     A new member of the structure:
       We add the new member to the ext4_sb_info.
       struct ext4_sb_info {
		...
	+	struct list_head s_bg_list;
       }

     Behavior in mballoc:
       In the free blocks lookup (ext4_mb_{simple, complex}_scan_group,
       ext4_mb_scan_aligned, etc.), they compare the bg relative balloc
       restriction list to the range of free blocks we got. If the free blocks
       range overlaps with the restricted blocks one, we shorten the free
       blocks one or do lookup again.

  b. Preferred block allocation
     For preferred block allocation, we add ioctl, structures, and
     a member for an existing structure.

     Ioctl:
       EXT4_IOC_ADD_INODE_ALLOC_RULE	_IOW('f', 18, struct ext4_alloc_rule);
       This ioctl sets the preferred range of blocks (struct ext4_alloc_rule)
       to the inode. The range is cancelled by doing block allocation or
       closing fd.

     Structure:
       * ext4_alloc_rule (describes the range of balloc restriction)
         struct ext4_alloc_rule {
		__u64 start;	  /* physical start offset in block */
		__u64 len;	  /* the length of the blocks range */
		__u32 alloc_flag; /* mandatory...0(default) advisory...1 */
         };

         If we fail allocation to the blocks of purpose, "mandatory" case
         causes ENOSPC. Meanwhile, "advisory" case tries to allocate from the
         other place.

      * ext4_inode_alloc_rule (stores allocation rule and pid which set rule)
        struct ext4_inode_alloc_rule {
		struct *ext4_alloc_rule alloc_rule;
		pid_t alloc_pid;
        }

        alloc_rule: Stores the contents of ext4_alloc_rule from the ioctl.
        alloc_pid: The pid of the process which sets "alloc_rule"

     A new member of the structure:
       We add the new member to struct ext4_inode_info.
       struct ext4_inode_info {
		...
	+	struct ext4_inode_alloc_rule *i_alloc_rule;
		spinlock_t i_block_reservation_lock;
       }

     Behavior in mballoc:
       If current->pid differs from ext4_inode_info->i_alloc_rule->alloc_pid,
       the ordinary multiblock routine is executed. If not, block allocator
       does the following behavior:

       When doing multiblock allocation, it sets ext4_allocation_request->
       {goal, len, flags} with the contents of struct alloc_rule.
       Then, the purpose blocks are allocated via existing mballoc process.

Best regards,
Akira Fujita
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ