linux-ext4 - Re: [PATCH, RFC 3/3] ext4: use the O_HOT and O_COLD open flags to influence inode allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120420022606.GA24486@thunk.org>
Date:	Thu, 19 Apr 2012 22:26:06 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Dave Chinner <david@...morbit.com>
Cc:	linux-fsdevel@...r.kernel.org,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH, RFC 3/3] ext4: use the O_HOT and O_COLD open flags to
 influence inode allocation

On Fri, Apr 20, 2012 at 09:27:57AM +1000, Dave Chinner wrote:
> So you're assuming that locating the inodes somewhere "hot" is going
> to improve performance. So say an application has a "hot" file (say
> an index file) but still has a lot of other files it creates and
> reads, and they are all in the same directory.
> 
> If the index file is created "hot", then it is going to be placed a
> long way away from all the other files that applciation is using,
> and every time you access the hot file you now seek away to a
> different location on disk. The net result: the application goes
> slower because average seek times have increased.

Well, let's assume the application is using all or most of the disk,
so the objects it is fetching from the 2T disk are randomly
distributed throughout the disk.  Short seeks are faster, yes, but the
seek time as a function of the seek distance is decidedly non-linear,
with a sharp "knee" in the curve at around 10-15% of a full-stroke
seek.  (Ref:
http://static.usenix.org/event/fast05/tech/schlosser/schlosser.pdf)

So most of the time, as you seek back and forth fetching data objects,
most of the time you will be incurring 75-85% of the cost of a
worst-case seek anyway.  So seeking *is* going to be a fact of life
that we can't run away from that.

Given that, the question then is whether we are better off (a) putting
the index files in the exact middle of the disk, trying to minimize
seeks, (b) scattering the index files all over the disk randomly, or
(c) concentrating the index files near the beginning of the disk?
Given the non-linear seek times, it seems to suggest that (c) would
probably be the best case for this use case.

Note that when we short-stroke, it's not just a matter of minimizing
seek distances; if it were, then it wouldn't matter if we used the
first third of the disk closest to the outer edge, or the last third
of the disk closer to the inner part of the disk.

Granted this may be a relatively small effect compared to the huge
wins of placing your data according to its usage frequency on tiered
storage.  But the effect should still be there.

Cheers,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html