linux-ext4 - Re: [RFC] dynamic inodes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 26 Sep 2008 14:01:45 -0600
From:	Andreas Dilger <adilger@....com>
To:	"Jose R. Santos" <jrs@...ibm.com>
Cc:	Alex Tomas <bzzz@....com>,
	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: [RFC] dynamic inodes

On Sep 26, 2008  09:49 -0500, Jose R. Santos wrote:
> Agreed, but performance wise this way is more consistent with the
> current block and inode allocators.  The block allocator will start its
> free block search on the block group that contains the inode.  Since
> these block groups do not contain any blocks, the block allocator will
> have to be modify to make sure data is not being placed randomly in the
> disk.

This is already the case today when a block group is full.  The block
allocator needs to handle this gracefully.

> The flex_bg inode allocator would also need to be modify since
> it currently depends on a algoright that assumes that block groups
> contain actual blocks.  One of the things that got flex_bg added to
> ext4 in the first place was performance the performance improvements it
> provided.  I would like to keep that advantage if possible.

I don't think the performance advantage was at all related to inode->block
locality (since this is actually worse with FLEX_BG) but rather better
metadata locality (e.g. contiguous bitmaps, itables avoiding seeking
during metadata operations).

> This could also be use to speed mkfs since we would not need to zero
> out as many inode tables.  We could initialize just a couple of inode
> tables per flex_bg group and allocate the rest dynamically.

There is already the ability to avoid zeroing ANY inode tables with
uninit_bg, but it is unsafe to do this in production because the old
itable data is there and e2fsck might become confused if the group
bg_itable_unused is lost (due to gdt corruption or other inconsistency).

> You do pay
> a small penalty when allocating a new inode table since we first need
> to find the blocks for that inode table as well as zeroing it afterward.
> The penalty is less than if we do the one time background zeroing of
> inode tables where your disk will be trashing for a while the first
> time it is mounted.

I don't think it is any different.  The itable zeroing is _still_ needed,
because the flag that indicates if an itable is used or not is unreliable
in some corruption cases, and we don't want to read garbage from disk.
IMHO when a filesystem is first formatted and mounted it is probably
mostly idle, and if not the zeroing (and other stuff) thread can be delayed
(e.g. in a new distro install maybe the itables aren't zeroed until the
second or third mount, no great loss/risk).

> If supporting already existing filesystems is really important we could
> always implement both techniques since they technically should not
> conflict with each other, though you couldn't use both of them at the
> same time if you have a 1:1 block/inode ratio.

IMHO dynamic inode tables for existing filesystems is the MAIN goal.
Once you know you have run out of inodes it is already too late to plan
for it, and if you need a reformat to implement this scheme you could
just as easily reformat with enough inodes in the first place :-).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html