[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a2uv6dsl3xqw2bpeob7tq6e52ezjrpuvcqqvttb5goltdzpsd@dsj6fxwqi32g>
Date: Sat, 23 Nov 2024 22:30:46 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Herbert Xu <herbert@...dor.apana.org.au>
Cc: Thomas Graf <tgraf@...g.ch>, netdev@...r.kernel.org,
Neil Brown <neilb@...e.de>
Subject: Re: rhashtable issue - -EBUSY
On Sun, Nov 24, 2024 at 11:16:11AM +0800, Herbert Xu wrote:
> On Sat, Nov 23, 2024 at 12:21:21PM -0500, Kent Overstreet wrote:
> > I'm seeing an issue where rhashtable inserts sporadically return -EBUSY,
> > thrown from rhashtable_insert_rehash() when there's already a rehash in
> > progress, i.e. rehashes aren't keeping up.
>
> EBUSY should never happen *if* you're using the hash table correctly.
> It is expected to happen if you insert multiple identical entries
> into the same hash table (the correct thing to do in that case is
> to use rhltable which is designed to accomodate multiple entries
> with the same key).
Well, reading the code that would appear to be -EEXIST, that's entirely
separate from the path you describe below.
We do indeed expect to hit the -EEXIST path in normal operation
(multiple threads racing to pull in the same inode).
> Now assuming that is not the case and you are using rhashtable
> correctly, this is when an EBUSY will occur during an insert:
>
> 1) The hash table elasticity has been violated, meaning that
> more than 16 entries are in a single chain.
>
> 2) The hash table is below 50% capacity (meaning that statistically
> we do not expect 1) to be true).
>
> 3) An existing rehash is already taking place.
This appears to be the path we're hitting.
> The reason we have the EBUSY mechanism in place is to prevent
> the case where we are being actively attacked by a hostile
> actor, who has somehow compromised our hash function.
>
> The first line of defence is to change our hash function and
> conduct a rehash. This is what would have occured if 3) is
> false.
>
> Once a rehash is in place, if we hit 1) + 2) again, then it
> means that our defence was futile and the hostile actor is
> still able to create arbitrarily long hash chains (possibly
> by compromising our RNG since we use that for the rehash),
> and as we do not have any defence mechanism for that, it is
> better to just fail.
>
> Of course this is meant to be impossible to hit in practice.
> Whenever it has occurred in the past, it's always been because
> people were tring to insert identical keyed entries into the
> same table (Use rhltable instead).
>
> Theoretically it is also possible to hit this if your hash
> table was immense (expected worst-case hash chain length is
> log N / log log N), but it has never been hit in the past
> with our default elasticity of 16.
Thanks for the background.
With the default hash function (jhash), I suspect this isn't so
impossible as you think - I had issues with jhash when doing cuckoo
hashing, and I had to switch to siphash there....
I'll see if switching to siphash solves it - and it sounds like some
tracepoints are in order, at least.
Powered by blists - more mailing lists