linux-ext4 - Re: [RFC] on-line resize with flex_bg and exclude

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTi=VjfQeH9TLBMJ9DpMwfzxA070VbQ@mail.gmail.com>
Date:	Wed, 13 Apr 2011 17:26:02 +0300
From:	Amir Goldstein <amir73il@...il.com>
To:	Andreas Dilger <adilger@...ger.ca>
Cc:	Theodore Tso <tytso@....edu>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [RFC] on-line resize with flex_bg and exclude_bitmap

On Wed, Apr 13, 2011 at 4:45 PM, Andreas Dilger <adilger@...ger.ca> wrote:
> On 2011-04-12, at 7:27 AM, Amir Goldstein wrote:
>> I realized another issue regarding exclude bitmap compatibility.
>> the exiting EXT4_IOC_GROUP_ADD ioctl doesn't pass a field for the location
>> of exclude_bitmap block, so we need to either allocate exclude_bitmap in kernel
>> or define a new ioctl, which passes the exclude_bitmap to kernel.
>>
>> If we are going to go with the latter solution, we may want to add
>> support for flex_bg layout for the new ioctl, so following is my proposal
>> for EXT4_IOC_FLEX_GROUP_ADD:
>>
>> 1. As far as online resize and mkfs are concerned, we always allocate all
>> group descriptors of a flex bg at the same time.
>>
>> 2. If there is not enough space for all flex_bg metadata in the last group,
>> the last group will be dropped.
>>
>> 3. The new flex group input will assume all bitmaps of the same type are
>> consecutive, so only the address of the first bitmap is needed as input.
>>
>> struct ext4_new_flex_group_input {
>>      __u32 group;            /* Group number for this data */
>
> This field will cause a misalignment/padding in the structure and will cause
> problems with 32-bit binaries on 64-bit kernels.
>
> Should we add a "__u32 flags" field to this structure?  I prefer flags/features
> instead of versions...  This will avoid the need to continually adding new
> ioctls in the future.

Agreed.
I just copy pasted the current ext4_new_group_input with its existing flaws.

>
>>      __u64 block_bitmap;     /* Absolute block number of first
>> block bitmap */
>>      __u64 exclude_bitmap;     /* Absolute block number of first
>> exclude bitmap */
>>      __u64 inode_bitmap;     /* Absolute block number of first
>> inode bitmap */
>>      __u64 inode_table;      /* Absolute block number of first
>> inode table start */
>>      __u32 blocks_count;     /* Total number of blocks in this flex group */
>>      __u16 reserved_blocks;  /* Number of reserved blocks in this
>> flex group */
>>      __u16 flex_size;            /* Number of groups in the flex group */
>> };
>
>
> Since the kernel only supports a single s_log_groups_per_flex value in the superblock, we should use the superblock value and change "flex_size" to be just a count of the number of groups that should be added.  That means resize2fs should align its ioctls to multiples of the superblock s_log_groups_per_flex.
>

Agreed, let's call it group_count.
In most cases, resize2fs would use the new ioctl to add *exactly* 1 flex group,
with all its bitmaps and descriptors, just as the current ioctl always
adds 1 group,
even if not all group is made available at that time.

resize2fs will need to use group_count < groups_per_flex to align
existing fs to flex group.
in case existing fs has 1 group and online resize extends it to 16 groups,
the bitmaps for group 1-15 will all be placed in group 1.
does that make sense? or would it be better to extend the fs 1 group
at a time until
flex group boundary in that case?



>
>> 4. ext4_group_extend() should be the same except we need to allow extending
>> within the last flex bg, but not necessarily the last block group.
>>
>>
>> To look at this from a different angle, if you imagine that the flex
>> group is just a big group, whose bitmaps and group descriptor are flex_size
>> times bigger and where ext4_new_flex_group_input encodes the info of the big
>> descriptor, then this design is identical to the current implementation of
>> online resize.
>>
>> What I will do, if you agree to this design, is use the new ioctl,
>> with flex_size = 1 to pass the exclude_bitmap in online resize and enforce
>> flex_size == 1 in kernel.
>
> If you are going to be modifying the kernel to add support for this ioctl,
> why not properly add in support for flex_size > 1?

because I don't have time to do it (test it) now :-(
and I want to post the e2fsprogs exclude bitmap patch as soon as possible.

> It should only involve
> the kernel looping over the groups as they are added.  This doesn't require
> that resize2fs has added support for it yet, but forcing flex_size == 1 in
> the kernel means that userspace will not know whether the kernel is doing
> the right thing or not.
>

if I implement group_count > 1 in the kernel, I will need to implement
it in e2fsprogs,
so I can test it properly.
hopefully, there will be no official kernel which enforces group_count == 1.

> In order to handle future resize in the case of a partially-added flex group,
> it would be desirable for the unused blocks (bitmaps, inode table) are reserved
> so that they are not allocated by the block allocator.
>

Yes, that would be desirable, but then again, we will anyway need to
handle resize
of fs which was formatted/resized before my patches.
Anayway, I hope I will implement proper flex_bg resize for 2.6.40, I'm
just not sure I can commit to it.

>> Then later we can teach resize2fs and the kernel to extend flex groups properly
>> using the new ioctl.
>>
>> What do you think?
>> Can you think of another way I can support exclude_bitmap and online resize,
>> without the need for a new ioctl?
>>
>> Amir.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html