lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20080522160317.GY3516@webber.adilger.int>
Date:	Thu, 22 May 2008 10:03:18 -0600
From:	Andreas Dilger <adilger@....com>
To:	Nathan Roberts <nroberts@...oo-inc.com>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: Storing inodes in a separate block device?

On May 22, 2008  09:53 -0500, Nathan Roberts wrote:
> Has a feature ever been considered (or already exist) for storing inodes in 
> a block device separate from the data? Is it even a "reasonable" thing to 
> do or are there major pitfalls that one would run into?

There was a filesystem called "dualfs" that implemented this - I believe
it was a hacked version of ext3.  It showed quite decent results.
Similarly, Lustre splits filesystem metadata (path, permissions,
attributes) onto different filesystems from the data.

For ext4 there is work being done on the FLEX_BG feature, which will allow
clustering of the metadata into a larger groups inside the filesystem, in
order to reduce seeking when doing filesystem scans.  This could be taken
to extremes to group all of the metadata into a single area, and use LVM
to place that on a separate disk.

Putting the journal on a separate disk would also help reduce the seeking
during writes.  Using flash for the journal is not useful because it does
almost exclusively linear IO and no seeking.

> The rationale behind this question comes from use cases where a file system 
> is storing very large numbers of files. Reading files in these file systems 
> will essentially incur at least two seeks: one for the inode, one for the 
> data blocks. If the seek to the inode were more efficient, dramatic 
> performance gains could be achieved for such use cases.
>
> Fast seeking devices (such as flash based devices) are becoming much more 
> mainstream these days and would seem like a reasonable device for the 
> inodes. The $/GB is not as good as disks but it's much better than DRAM. 
> For many use cases, the number of these "fast access" inodes that would 
> need to be cached in RAM is near 0. So, RAM savings are also a potential 
> benefit.
>
> I've ran some basic tests using ext4 on a SATA array plus a USB thumb drive 
> for the inodes. Even with the slowness of a thumb drive, I was able to see 
> encouraging results ( >50% read throughput improvement for a mixture of 
> 4K-8K files).

Ah, are you using FLEX_BG for this test?  It would also be interesting to
see if splitting the metadata onto a normal disk had the same effect,
just by virtue of allowing the data and metadata to seek independently.

There is also work to aggregate the allocation of ext4 file allocation
metadata, and while this speeds up unlink the current algorithm hurts
the read performance.  Having the file allocation metadata on a separate
disk may avoid this performance hit.  It may also be that we just need
to make more effort to do readahead on this metadata.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ