linux-kernel - Re: [Slightly off topic] A question about R/B trees.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87myh2jq55.fsf@basil.nowhere.org>
Date:	Sat, 18 Oct 2008 09:53:10 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Chris Snook <csnook@...hat.com>
Cc:	Maxim Levitsky <maximlevitsky@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [Slightly off topic] A question about R/B trees.

Chris Snook <csnook@...hat.com> writes:

> Maxim Levitsky wrote:
>> I am working on my small project, and I need a fast container to
>> hold a large sparse array.
>> Balanced trees seem to fit perfectly.
>
> Balanced trees take O(log n) to perform a great many operations, and
> traversing a binary tree is a particularly bad case for branch
> prediction.  Hash tables will perform much better, unless you get them
> horribly wrong.

That seems like a unfair generalization.

The problem with hash tables is that if they're big enough
or if the rest of the workload is memory intensive
each hit will be a cache miss. And you can do a lot of branch
mispredicts in the time of a single cache miss.

In general trees can be much better for cache usage, although
it's generally better to use some tree that has nodes near 
the cache line size. Binary trees like RB are too small for that.

The other advantage of trees is that they scale naturally.
For hash tables you either have it being sized for the worst
case (usually wasting a lot of memory[1] and making the cache miss
problem worse) or you need to do dynamic rehashing which 
is complex and difficult to get right especially in multi threaded
situations.

That is why the trend in Linux at least is to move away from
hash tables.

[1] Just take a look at how much memory that various hash 
tables in Linux use: dmesg | grep hash

> The kernel is written in a dialect of C that makes several assumptions
> about the compiler, among them that the compiler won't screw this up
> unless you tell it to.  Any compiler that has alignment problems with
> the rbtree code is going to have similar problems in lots of other
> places too.  We don't support those compilers.

At least upto 4 bytes or so it's generally a safe assumption
that objects will be naturally aligned. Also except for tree 
root the objects are typically allocated with malloc() (or
equivalent kmalloc) anyways and malloc()s guarantee
worst case alignment.

-Andi
-- 
ak@...ux.intel.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/