lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 30 Sep 2008 18:02:16 +0400
From:	Alex Tomas <bzzz@....com>
To:	Andreas Dilger <adilger@....com>
Cc:	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: [RFC] dynamic inodes

Andreas Dilger wrote:
> It _sounds_ simple, but I think the implementation will not be what
> is expected.  Either you need to keep a 3rd bitmap for each group
> which is (I&B) used for finding either inodes or blocks first (with
> respectively find_first_bit() or find_first_zero_bit()), then check the
> "normal" inode and block bitmaps, keeping this in sync with mballoc, and
> confusion/danger on disk/e2fsck because in-use itable blocks are marked
> "0" in the block bitmap.  There will be races between updating these
> bitmaps, unless the group is locked for both block or inode allocations
> on any update because setting any bit completely changes the meaning.
> 
> Alternately, if there are only I and B bitmaps, then find_first_bit()
> and find_first_zero_bit() are not useful.  Searching for free blocks
> means looking for "B:0" and finding potentially many "B:0 I:1" blocks
> that are full of inodes.  Searching for free inodes means looking for
> "I:1" (strangely) but finding potentially many "I:1 B:0" blocks.

mballoc already maintains own in-core copy, so we'd have to apply another
bitmap to it. as for races - I think this can be done by proper ordering,
probably w/o locks even: free block turns used first, then becomes part
of "fragmentary" space. anyway, the complexity would be away simpler than
mballoc itself, for example.

> I much prefer the dynamic itable idea from José (which I embellished in
> my other email), which is very simple for both the kernel and e2fsck,
> robust, and avoids the 64-bit inode problem for userspace to the maximum
> amount (i.e. a full 4B inodes must be in use before we ever need to
> use 64-bit inodes).  The lack of complexity in itable allocation also
> translates directly into increased robustness in the face of corruption.
> 
> It doesn't provide dynamic-sized inodes (which hasn't traditionally
> been a problem), nor is it perfect in terms of being able to fully
> populate a filesystem with inodes in all use cases but it could work
> in all but completely pathalogical fragmentation cases (at which point
> one wonders if it isn't better to just return -ENOSPC than to flog a
> nearly dead filesystem).  It can definitely do a good job in most likely
> uses, and also provides a big win over what is done today.

I do understand simplicity and robustness as driving reasons much. and I
agree dynamic inodes added via empty group descriptors is an excellent idea.
but I still think that with original idea we could get much more than just
dynamic inodes (though it was original intention). for example, storing
small directories (upto ~200 dir entries) within inode could be very nice
to avoid bunch of seeks. and tail packing could be as well. notice we don't
really need fine structures to find free slots in that fragmentary space -
usually small files are generated at once (e.g. tar -xf), so to pack them
we just need to remember few last "partial filled" blocks. if some file
is deleted and we get free slot in corresponded block - who cares - with
current ext3/4 we'd waste whole block anyway.

another reason for that design was to support >2^32 files per filesystem.

thanks, Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ