linux-ext4 - Re: Status of META

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F63433B.1020904@ubuntu.com>
Date:	Fri, 16 Mar 2012 09:42:19 -0400
From:	Phillip Susi <psusi@...ntu.com>
To:	Andreas Dilger <adilger@...ger.ca>
CC:	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Status of META_BG?

On 3/15/2012 5:06 PM, Andreas Dilger wrote:
>> To get an fs that large, you have to enable 64bit support, which also means you can pass the limit of 32k blocks per group.
>
> I'm not sure what you mean here.  Sure, there can be more than 32k
> blocks per group, but there is still only a single block bitmap per
> group so having more blocks is dependent on a larger blocksize.

Heh, I'm not sure what you mean here.  What does the block bitmap have 
to do with anything?  I thought the issue was that the size of the block 
group descriptor table exceeded the size of a block group, as a result 
of there being a huge number of block groups, limited to a size of 128 MB.

>>   Doing that should allow for a much more reasonable number of groups ( which is a good thing several reasons ), and would also solve this problem wouldn't it?
>
> Possibly in conjunction with BIGALLOC.

BIGALLOC?

>> So it puts one GD block at the start of every several block groups?
>
> One at the start of the first group, the second group, and the last
> group.

You mean one copy of the whole table?  That's not what the current code 
in e2fsprogs looks like it does to me.  openfs.c has:

> blk64_t ext2fs_descriptor_block_loc2(ext2_filsys fs, blk64_t group_block,
>                                      dgrp_t i)
> {
>         int     bg;
>         int     has_super = 0;
>         blk64_t ret_blk;
>
>         if (!(fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG) ||
>             (i < fs->super->s_first_meta_bg))
>                 return (group_block + i + 1);
>
>         bg = EXT2_DESC_PER_BLOCK(fs->super) * i;
>         if (ext2fs_bg_has_super(fs, bg))
>                 has_super = 1;
>         ret_blk = ext2fs_group_first_block2(fs, bg) + has_super;

That appears to map the GDT block number to a block group based on how 
many group descriptors fit in a block, so there's one GDT block every 
several block groups.  The subsequent code then checks if it is being 
asked for a backup and shifts the result over by one whole block group, 
so it looks like there is exactly one backup, whose blocks are each 
stored in the block group following the one that holds the corresponding 
primary GDT block.

>>   Wouldn't that drastically slow down opening/mounting the fs since the disk has to seek to every block group?
>
> Yes, definitely.  That wasn't a concern before flex_bg arrived, since
> that seek was needed for every group's block/inode bitmap as well.

But you don't need to scan every bitmap at mount time do you?  Aren't 
they loaded on demand when the group is first accessed?  But you do need 
to scan all of the group descriptors at mount time.

> Maybe with bigalloc the number of groups is reduced, and the size
> of the groups is increased, which helps two ways.  First, fewer
> groups means fewer GD blocks, and larger groups mean more GD blocks
> can fit into the 0th and 1st groups.

That's what I was talking about.  I'm not sure what bigalloc is, but 
once you enable 64bit, that gets you the ability to have more than 32768 
blocks per group, so you have less groups and more room in them.

> Well, the "mke2fs -S" is only applying a best guess estimate of the
> metadata location using default parameters.  If the default parameters
> are not identical (e.g. flex_bg on/off, bigalloc on/off, etc) then
> "mke2fs -S" will only corrupt an already-fatally-corrupted filesystem,
> and you need to start from scratch.

That's true of mke2fs -S, but you could do the same thing, but consult 
the existing superblock to determine the parameters.  I believe that all 
parameters that affect the contents of the GDT can be found in the 
superblock.  Specifically, block size,  blocks per group, flex factor. 
Given that information, e2fsck should be able to rebuild the GDT.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html