[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2s73sed5n6kxg42xqceenjtcwxys4j2r5dc5x4fdtwkmhkw3go@7viy7qli43wd>
Date: Sat, 24 Feb 2024 22:18:31 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: David Laight <David.Laight@...lab.com>
Cc: 'Herbert Xu' <herbert@...dor.apana.org.au>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Thomas Graf <tgraf@...g.ch>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH 0/1] Rosebush, a new hash table
On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
> From: Herbert Xu
> > Sent: 24 February 2024 00:21
> >
> > On Thu, Feb 22, 2024 at 08:37:23PM +0000, Matthew Wilcox (Oracle) wrote:
> > >
> > > Where I expect rosebush to shine is on dependent cache misses.
> > > I've assumed an average chain length of 10 for rhashtable in the above
> > > memory calculations. That means on average a lookup would take five cache
> > > misses that can't be speculated. Rosebush does a linear walk of 4-byte
> >
> > Normally an rhashtable gets resized when it reaches 75% capacity
> > so the average chain length should always be one.
>
> The average length of non-empty hash chains is more interesting.
> You don't usually search for items in empty chains.
> The only way you'll get all the chains of length one is if you've
> carefully picked the data so that it hashed that way.
>
> I remember playing around with the elf symbol table for a browser
> and all its shared libraries.
> While the hash function is pretty trivial, it really didn't matter
> whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> chains were always long.
that's a pretty bad hash, even golden ratio hash would be better, but
still bad; you really should be using at least jhash.
you really want a good avalanche effect, because in real world usage
your entropy is often only in a relatively few bits.
when I implemented cuckoo (which is more obviously sensitive to a weak
hash function), I had to go with siphash, even jhash wasn't giving me
great reslts. and looking at the code it's not hard to see why, it's all
adds, and the rotates are byte aligned... you want mixed adds and xors
and the rotates to be more prime-ish.
right idea, just old...
what would be ideal is something more like siphash, but with fewer
rounds, so same number of instructions as jhash. xxhash might fit the
bill, I haven't looked at the code yet...
Powered by blists - more mailing lists