[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110223044427.GM21917@bitwizard.nl>
Date: Wed, 23 Feb 2011 05:44:27 +0100
From: Rogier Wolff <R.E.Wolff@...Wizard.nl>
To: Ted Ts'o <tytso@....edu>
Cc: linux-ext4@...r.kernel.org
Subject: Re: fsck performance.
On Tue, Feb 22, 2011 at 05:13:04PM -0500, Ted Ts'o wrote:
> On Tue, Feb 22, 2011 at 09:32:28AM -0700, Andreas Dilger wrote:
> >
> > Any idea what the hash size does to memory usage? I wonder if we
> > can scale this based on the directory count, or if the memory usage
> > is minimal (only needed in case of tdb) then just make it the
> > default. It definitely appears to have been a major performance
> > boost.
>
> Yeah, that was my question. Your patch adds a magic number which
> probably works well on your machine (and I'm not really worried if
> someone has less than 1G --- here's a quarter kid, buy your self a
> real computer :-). But I wonder if we should be using a hash size
> which is sized automatically depending on available memory or file
> system size.
I fully agree that having "magic numbers" in the code is a bad thing.
A warning sign.
I don't agree with your argument that 1G RAM should be considered
minimal. There are storage boxes (single-disk-nas systems) out there
that run on a 300MHz ARM chip and have little RAM. Some of them use
ext[234].
For example: http://iomega.nas-central.org/wiki/Main_Page
64Mb RAM. I'm not sure wether that CPU is capable of virtual memory.
I just mentioned this one because a friend brought one into my office
last week. I don't think it happens to run Linux. On the other hand,
some of the "competition" do run Linux.
As to the "disadvantage" of using a large hash value:
As far as I can see, the library just seeks to that position in the
tdb file. With 32-bit file offsets (which is hardcoded into tdb), that
means the penalty is 4*hash_size of extra disk space. So with my
currently suggested value that comes to 4Mb.
As my current tdb database amounts to 1.5Gb I think the cost is
acceptable.
With the number of keys up to 1 million, we can expect a speedup of
1M/131 = about 7500. Above that we won't gain much anymore.
This is assuming that we have a next-to-perfect hash function. In fact
we don't because I see about a 30% hash bucket usage. And I surely
think my fsck has used over 1M of the keys....
I just tested the hash function: I hashed the first 10 million numbers
and got 91962 unique results (out of a possible 99931). That's only
about 10%. That's a lot worse than what e2fsck is seeing. And this is
the simplest case to get right.
Here is my test program.
#include <stdio.h>
#include <stdlib.h>
typedef unsigned int u32;
/* This is based on the hash algorithm from gdbm */
static unsigned int default_tdb_hash(unsigned char *key)
{
u32 value; /* Used to compute the hash value. */
u32 i; /* Used to cycle through random values. */
/* Set the initial value from the key size. */
for (value = 0x238F13AF * 4, i=0; i < 4; i++)
value = (value + (key[i] << (i*5 % 24)));
return (1103515243 * value + 12345);
}
int main (int argc, char **argv)
{
int i;
int max = 1000000;
if (argc > 1) max = atoi (argv[1]);
for (i=0;i < max;i++) {
printf ("%u %u\n", i, default_tdb_hash ((unsigned char *)&i) % 99931);
}
exit (0);
}
and here is the commandline I used to watch the results.
./a.out 10000000 | awk '{print $2}' | sort | uniq -c |sort -n | less
It seems my "prime generator" program is wrong too. I had thought to
choose a prime with 99931, but apparently it's not prime. (13*7687).
Which, for hashing should not be too bad, but I'll go look for a
prime and check again. Ok. Hash bucket usage shot up: 16%.
I just "designed" a new hash function, based on the "hash" page on
wikipedia.
static unsigned int my_tdb_hash(unsigned char *key)
{
u32 value; /* Used to compute the hash value. */
u32 i; /* Used to cycle through random values. */
/* Set the initial value from the key size. */
for (value = 0, i=0; i < 4; i++)
value = value * 256 + key[i] + (value >> 24) * 241;
return value;
}
It behaves MUCH better than the "default_tdb_hash" in that it has 100%
bucket usage (not almost, but exaclty 100%). It's not that hard to get
right.
The "hash" at the end (times BIGPRIME + RANDOMVALUE) in the original
is redundant. It only serves to make the results less obvious to
humans, but there is no computer-science relevant reason.
I'll shoot off an Email to the TDB guys as well.
Roger.
--
** R.E.Wolff@...Wizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists