[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090207122618.GA17222@logfs.org>
Date: Sat, 7 Feb 2009 13:26:18 +0100
From: Jörn Engel <joern@...fs.org>
To: Johannes Berg <johannes@...solutions.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Theodore Tso <tytso@....edu>, Andi Kleen <andi@...stfloor.org>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Linux Kernel list <linux-kernel@...r.kernel.org>,
"Luis R. Rodriguez" <mcgrof@...il.com>
Subject: Re: [PATCH] add b+tree library
On Thu, 5 February 2009 01:17:46 +0100, Johannes Berg wrote:
>
> Joern may need arbitrary key lengths, don't. But I've just looked around
> a little:
>
> * radix trees are completely unsuitable for use as a sort of hash table
> because of their behaviour when keys are not at last mostly
> contiguous
> * rbtrees require lots of boilerplate code, and have much worse cache
> behaviour
I did some testing as well. And I didn't like the results very much.
In my test harness, rbtrees performed roughly twice as good as btrees.
Something clearly is wrong with my theory. To spell it out, the theory
assumes that 1) CPUs get continuously faster at computations while
memory latency stays roughly constant and as a result 2) current CPUs
are sufficiently fast that memory latency is more important than a large
amount of computation. And maybe 3) L1 cache latency can be ignored,
while DRAM latency most definitely can not.
At least one of the above must be wrong. Another interesting data point
is that after hacking up binary search within nodes, btrees performed
roughly 10% better than before. Binary search means we mispredict every
other branch, yet this still improved performance. So on my test CPU
(Pentium M), branch mispredictions must be relatively cheap compared to
either calculations or L1 cache latency.
I also tried to rig the tests to favor btrees over rbtrees. Since the
rb_node is embedded in an object, I grew those objects beyond cacheline
size, so no two rb_nodes would ever share a cacheline while all the
pointers in btrees still do. And still btrees lost. Well - if the
dataset is large enough that all the object padding is comsuming half a
gigabyte of memory, swapping will make the rbtree load go slow, but
given enough free memory and no swapping (i.e. second run) it beats
btrees.
So there are two results I can see from all this. Rbtrees are still a
good choice an semi-current machines and the kernel doesn't need much
rework yet. Whether my assumption 2) above will match reality better in
the future and the scales will tip to the other side I don't know. The
other is that my assumptions are wrong somewhere and I don't yet
understand where. If anyone has an idea, I'd be glad to hear about it.
Jörn
--
Science is like sex: sometimes something useful comes out,
but that is not the reason we are doing it.
-- Richard Feynman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists