linux-ext4 - Updated heuristics for mke2fs on large filesystems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <97674ACD-2827-443F-98C8-A43B39613229@dilger.ca>
Date:	Thu, 9 Jun 2011 17:55:06 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	Ted Ts'o <tytso@....edu>
Cc:	ext4 List <linux-ext4@...r.kernel.org>
Subject: Updated heuristics for mke2fs on large filesystems

As discussed previously, there was an interest to solicit input on
changing the default parameters to mke2fs so that it takes newer disks
into account better by default, instead of expecting users to know the
right tunables to pass.

Some of the issues proposed were:

- higher inode ratio (up 1MB for large LUNs)

 With multi-TB drives, and modern media files, the average file size
 in large filesystems is much larger than the default of 16kB/inode.
 The   "unint_bg" feature keeps a high-watermark for inode table usage,
 but errors in the group descriptor checksum with large inode tables
 can cause major slowdowns to e2fsck.  Also, it takes a serious amount
 of time to format the filesystem when zeroing the inode tables, if
 the kernel doesn't support automatic itable zeroing.

- flex_bg aligned to s_raid_stride, with aligned inode tables/bitmaps

 With newer versions of mke2fs, it automatically detects the underlying
 geometry of the device (if available).  This is used to specify the
 s_raid_stride and s_raid_stripe_size values in the superblock, which
 aid in aligning the block/inode bitmaps, for non-flex_bg filesystems.
 For flex_bg filesystems it would make sense to make the flex_bg factor
 equal to the s_raid_stripe_size, so that the block/inode bitmaps can
 be sized/aligned on RAID stripe or SSD erase block boundaries.

- ability to specify journal offset directly

 This is useful for being able to align the journal on RAID boundaries,
 or allocated within an SSD portion of the filesystem, if desired.

- larger journal size

 There is data that indicates having a larger journal size can improve
 IO performance with many concurrent threads.  This needs to be balanced
 against the journal consuming too much RAM on systems that don't have
 much.

- lower reserved space ratio

 Some people feel that reserving 5% of very large filesystems wastes too
 much space, and the reserved space ratio should be capped at some limit
 regardless of how large the filesystem is.

Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html