linux-ext4 - [RFC 0/2] ext4: zero uninitialized inode tables

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20081121102309.182113793@bull.net>
Date:	Fri, 21 Nov 2008 11:23:09 +0100
From:	<Solofo.Ramangalahy@...l.net>
To:	linux-ext4@...r.kernel.org
Subject: [RFC 0/2] ext4: zero uninitialized inode tables

The time to format a filesystem is mostly linear with filesystem size.

Exact time spent on formating depends on hardware and software, but
this is mainly explained by the zeroing of some blocks (inode, block
bitmaps and inodes tables).
While the mkfs time can be considered negligible (for example compared
to RAID formatting of disk arrays), it is significant compared
to the formating time of others filesystems.
This is noticeable when conducting performance comparison tests, or
testing involving multiple formatting of the same device.
This may become prohibitive for large disks (arrays).

For some measurements, see:
http://www.bullopensource.org/ext4/20080909-mkfs-speed-lazy_itable_init/
http://www.bullopensource.org/ext4/20080911-mkfs-speed-lazy_itable_init/
http://www.bullopensource.org/ext4/20080912-mkfs-speed-lazy_itable_init/
so far it is under one hour, further measurements would be needed,
like for 16TB filesystems.

It is possible to skip the initialization of the inode tables blocks
with the mkfs option "lazy_itable_init" (mkfs.ext4(8)).
However, this option is not safe with respect to fsck, as there is no
way to distinguish between an unitialized block filled with old bits
and a corrupted one.
(The use of lazy_itable_init could be considered safe in the case where
the blocks of the disk, in particular those used by the inode tables,
are prefilled with zeros.)

These patches (try to) initialize the inode tables after mount via a
kernel thread launched by module loading. The goal is to find a
tradeoff between speed and safety.

Apart from use in testing, another use case could be a distribution
installation: since device size rises faster than system size, the
percentage of the formating time during the installation will
increase. Since the system will use a fragment of the full device (say
10GB for system installation on a 1TB disk), it would not be strictly
necessary to initialize all the inode tables before starting the
installation, for example for the home partition.

So far, I've only been able to initialize some small filesystems with
this code (using 2.6.28-rc4).
For example, like this:

. dd if=/dev/zero of=/tmp/ext4fs.img bs=1M count=1024
. losetup /dev/loop0 /tmp/ext4fs.img
. mkfs.ext4 -O^resize_inode -Elazy_itable_init /dev/loop0
. mount /dev/loop0 /mnt/test-ext4
. [dumpe2fs /dev/loop0]
. modprobe ext4_itable_init
. [dumpe2fs /dev/loop0  # here check the ITABLE_ZEROED]
. umount /mnt/test-ext4
. [dumpe2fs /dev/loop0]
. [fsck /dev/loop0]

But I also hitted several bugs and managed to somehow screw up my
machine. So be _extremly_ careful if ever you try the code!

TODO:
. fix the resize inode case
. fix the observed soft lockup
. decide whether to keep it a module.
  If not, decide how/when run the kernel thread
. initialize some blocks (for example the non-empty ones) at mount
  time, or somewhere else.
. non-empty group case
. feature interactions? (for example inode zeroing vs. resize)
. multiple threads (based on cpu/disks)
. other ?

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html