lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 24 Oct 2012 13:49:20 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Arnd Bergmann <arnd@...db.de>
Cc:	Jaegeuk Kim <jaegeuk.kim@...il.com>,
	Jaegeuk Kim <jaegeuk.kim@...sung.com>,
	'Vyacheslav Dubeyko' <slava@...eyko.com>,
	viro@...iv.linux.org.uk, 'Theodore Ts'o' <tytso@....edu>,
	gregkh@...uxfoundation.org, linux-kernel@...r.kernel.org,
	chur.lee@...sung.com, cm224.lee@...sung.com,
	jooyoung.hwang@...sung.com
Subject: Re: [PATCH 11/16] f2fs: add inode operations for special inodes

On Wed, Oct 17, 2012 at 12:50:11PM +0000, Arnd Bergmann wrote:
> On Tuesday 16 October 2012, Jaegeuk Kim wrote:
> > > IIRC, fs2fs uses 4k inodes, so IMO per-inode xattr tress with
> > > internal storage before spilling to an external block is probably
> > > the best approach to take...
> > 
> > Yes, indeed this is the best approach to f2fs's xattr.
> > Apart from giving fs hints, it is worth enough to optimize later.
> 
> I've thought a bit more about how this could be represented efficiently
> in 4KB nodes. This would require a significant change of the way you
> represent inodes, but can improve a number of things at the same time.
> 
> The idea is to replace the fixed area in the inode that contains block
> pointers with an extensible TLV (type/length/value) list that can contain
> multiple variable-length fields, like this.

You've just re-invented inode forks... ;)

> All TLVs together with the
> fixed-length inode data can fill a 4KB block.
> 
> The obvious types would be:
> 
> * Direct file contents if the file is less than a block
> * List of block pointers, as before, minimum 1, maximum until the end
>   of the block
> * List of indirect pointers, now also a variable length, similar to the
>   list of block pointers
> * List of double-indirect block pointers
> * direct xattr: zero-terminated attribute name followed by contents
> * indirect xattr: zero-terminated attribute name followed by up to
>   16 block pointers to store a maximum of 64KB sized xattrs
> 
> This could be extended later to cover additional types, e.g. a list
> of erase block pointers, triple-indirect blocks or extents.

An inode fork doesn't care about the data in it - it's just an
independent block mapping index. i.e. inline, direct,
indirect, double indirect. The data in the fork is managed
externally to the format of the fork. e.g. XFS has two forks - one
for storing data (file data, directory contents, etc) and the other
for storing attributes.

The main issue with supporting an arbitrary number of forks is space
management of the inode literal area.  e.g. one fork is in inline
format (e.g.  direct file contents) and then we add an attribute.
The attribute won't fit inline, nor will an extent form fork header,
so the inline data fork has to be converted to extent format before
the xattr can be added. Now scale that problem up to an arbitrary
number of forks....

> As a variation of this, it would also be nice to turn around the order
> in which the pointers are walked, to optimize for space and for growing
> files, rather than for reading the beginning of a file. With this, you
> can represent a 9 KB file using a list of two block pointers, and 1KB
> of direct data, all in the inode. When the user adds another byte, you
> only need to rewrite the inode. Similarly, a 5 MB file would have a
> single indirect node (covering block pointers for 4 MB), plus 256
> separate block pointers (covering the last megabyte), and a 5 GB file
> can be represented using 1 double-indirect node and 256 indirect nodes,
> and each of them can still be followed by direct "tail" data and
> extended attributes.

I'm not sure that the resultant code complexity is worth saving an
extra block here and there.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ