[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081028012557.GP18495@disturbed>
Date: Tue, 28 Oct 2008 12:25:57 +1100
From: Dave Chinner <david@...morbit.com>
To: Jörn Engel <joern@...fs.org>
Cc: linux-kernel@...r.kernel.org
Subject: Re: [RFC] B+Tree library
On Sun, Oct 26, 2008 at 01:46:44PM +0100, Jörn Engel wrote:
> The idea of using btrees in the kernel is not exactly new. They have a
> number of advantages over rbtrees and hashtables, but also a number of
> drawbacks. More importantly, logfs already makes use of them and -
> since I don't want any incompatible code to get merged and suffer the
> trouble it creates - I would like to discuss my implementation and where
> it makes sense and where it doesn't.
>
> General advantages of btrees are memory density and efficient use of
> cachelines. Hashtables are either too small and degrade into linked
> list performance, or they are too large and waste memory. With changing
> workloads, both may be true on the same system. Rbtrees have a bad
> fanout of less than 2 (they are not actually balanced binary trees),
> hence reading a fairly large number of cachelines to each lookup.
>
> Main disadvantage of btrees is that they are complicated, come in a
> gazillion subtly different variant that differ mainly in the balance
> between read efficiency and write efficiency. Comparing btrees against
> anything is a bit like comparing apples and random fruits.
>
> This implementation is extremely simple. It splits nodes when they
> overflow. It does not move elements to neighboring nodes. It does not
> try fancy 2:3 splits. It does not even merge nodes when they shrink,
> making degenerate cases possible. And it requires callers to do
> tree-global locking. In effect, it will be hard to find anything less
> sophisticated.
>
> The one aspect where my implementation is actually nice is in allowing
> variable key length. Btree keys are interpreted as an array of unsigned
> long. So by passing the correct geometry to the core functions, it is
> possible to handle 32bit, 64bit or 128bit btrees, which logfs uses. If
> so desired, any other weird data format can be used as well (Zach, are
> you reading this?).
>
> So would something like this be merged once some users are identified?
> Would it be useful for anything but logfs? Or am I plain nuts?
I think a btree library would be useful - there are places where
people are using rb-trees instead of btree simply because it's
easier to use the rbtree than it is to implement a btree library.
I can think of several places I could use such a library for
in-memory extent representation....
That being said, I haven't had a chance to look at that code yet....
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists