lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <9500D51F-7E89-41F2-9A77-0E1A79136240@dilger.ca>
Date:	Fri, 10 Jun 2011 14:37:11 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	Phillip Susi <psusi@....rr.com>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: 64bit filesystem questions

On 2011-06-10, at 11:45 AM, Phillip Susi wrote:
> On 6/10/2011 1:29 PM, Andreas Dilger wrote:
>> On 2011-06-10, at 11:14 AM, Phillip Susi wrote:
>>> On 6/10/2011 12:19 PM, Andreas Dilger wrote:
>>>> I think in the presence of flex_bg this issue is moot.
>>> 
>>> What is the issue without flex_bg?
>> 
>> No "issue" really, just that the block/inode bitmaps are spread all over
>> the filesystem.  The original discussion was about whether there could be
>> "larger bitmaps that addressed more than 32768 blocks", which is essentially
>> what the flex_bg feature provides.  With flex_bg the bitmaps for different
>> groups will be allocated adjacent to each other on disk, and allow addressing
>> more than 32768 blocks without any seeking.
>> 
>> On large filesystems without flex_bg, the distribution of the bitmaps without
>> flex_bg means that a seek is needed to read each one, and given that spinning
>> disks have stayed at about 100 seeks/sec for decades it means 10+ minutes just
>> to read all of the bitmaps.
>> 
>> On my 2TB 5400 RPM SATA drive, e2fsck time went from ~20 minutes to ~3 minutes
>> by copying the data to a new ext4 filesystem with flex_bg + extents.  For a
>> fair comparison, I then reformatted the original (identical) disk without
>> flex_bg or extents and copied the data back, so that there wasn't any unfair
>> comparison between the newly-formatted filesystem and the old fragmented one.
> 
> I know what flex_bg is; what I don't understand is what it has to do with the limit on the size of a block group.  Whether the block bitmaps are stored in their native block group, or clustered up with flex_bg does not seem to have anything to do with whether or not the size of the bitmap can exceed 32k blocks.

I hope it is obvious that a single bitmap block can only address the number
of bits (==blocks) that fit within that block.  To address more blocks the
block bitmap needs to be larger than a single block in size.  One possible
way to do this (discussed early on for ext4) would be to have N block
bitmap blocks per group.  That raises issues of how to address those blocks
for each "block group", and what the meaning of a "block group" really is.

The other (very similar, but not identical) approach is to essentially merge
N adjacent "block groups" into a single "large block group" that has N block
bitmaps, and addresses N * blocksize * 8 blocks per "large block group".
In this case "N" is the flex_bg factor (constrained to 2^n), and the "large
block group" is called a "flex group".  It achieves exactly the same thing
as having N block bitmaps per group, with the only difference that there are
N group descriptors that point to the bitmaps, and they no longer have to be
located within the groups themselves

There is virtually no difference between "larger bitmap" and "flex_bg":

"b"=block bitmap, "i"=inode bitmap, "."=data block

Non-flex_bg configuration for 4 groups * 32768 blocks:

bi...{32760}...bi...{32760}...bi...{32760}...bi...{32760}...

Each block bitmap addresses 32768 blocks in total (including itself).

flex_bg configuration for the same 4 groups * 32768 blocks:

bbbbiiii.....................{131020}.......................

If you treat the four "bbbb" blocks as a single block bitmap, and "iiii"
as a single inode bitmap, and the contiguous range of free blocks as a
single group, it is exactly what you are asking for - a larger bitmap.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ