linux-kernel - Re: [PATCH 0/1] Rosebush, a new hash table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c0aad2c-548a-4287-b3d5-c7932f40c96f@bytedance.com>
Date: Fri, 23 Feb 2024 19:37:20 +0800
From: Peng Zhang <zhangpeng.00@...edance.com>
To: "Matthew Wilcox (Oracle)" <willy@...radead.org>
Cc: Thomas Graf <tgraf@...g.ch>, Herbert Xu <herbert@...dor.apana.org.au>,
 netdev@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 maple-tree@...ts.infradead.org, rcu@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/1] Rosebush, a new hash table



在 2024/2/23 04:37, Matthew Wilcox (Oracle) 写道:
> Rosebush is a resizing, scalable, cache-aware, RCU optimised hash table.
> I've written a load of documentation about how it works, mostly in
> Documentation/core-api/rosebush.rst but some is dotted through the
> rosebush.c file too.
> 
> You can see this code as a further exploration of the "Linked lists are
> evil" design space.  For the workloads which a hashtable is suited to,
> it has lower overhead than either the maple tree or the rhashtable.
> It cannot support ranges (so is not a replacement for the maple tree),
> but it does have per-bucket locks so is more scalable for write-heavy
> workloads.  I suspect one could reimplement the rhashtable API on top
> of the rosebush, but I am not interested in doing that work myself.
> 
> The per-object overhead is 12 bytes, as opposed to 16 bytes for the
> rhashtable and 32 bytes for the maple tree.  The constant overhead is also
> small, being just 16 bytes for the struct rosebush.  The exact amount
> of memory consumed for a given number of objects is going to depend on
> the distribution of hashes; here are some estimated consumptions for
> power-of-ten entries distributed evenly over a 32-bit hash space in the
> various data structures:
> 
> number	xarray	maple	rhash	rosebush
> 1	3472	272	280	256
> 10	32272	784	424	256
> 100	262kB	3600	1864	2080
> 1000	[1]	34576	17224	16432
> 10k	[1]	343k	168392	131344
> 100k	[1]	3.4M	1731272	2101264
> 
> As you can see, rosebush and rhashtable are close the whole way.
> Rosebush moves in larger chunks because it doubles each time; there's
> no actual need to double the bucket size, but that works well with
> the slab allocator's existing slabs.  As noted in the documentation,
> we could create our own slabs and get closer to the 12 bytes per object
> minimum consumption. [2]
> 
> Where I expect rosebush to shine is on dependent cache misses.
> I've assumed an average chain length of 10 for rhashtable in the above
> memory calculations.  That means on average a lookup would take five cache
> misses that can't be speculated.  Rosebush does a linear walk of 4-byte
> hashes looking for matches, so the CPU can usefully speculate the entire
> array of hash values (indeed, we tell it exactly how many we're going to
> look at) and then take a single cache miss fetching the correct pointer.
> Add that to the cache miss to fetch the bucket and that's just two cache
> misses rather than five.
> 
> I have not yet converted any code to use the rosebush.  The API is
> designed for use by the dcache, and I expect it will evolve as it actually
Hello,

It seems that the advantage of this hash table is that it does not need to
traverse the linked list and has fewer cache misses. I want to know how
much performance improvement is expected if it is applied to dcache?

Thanks,
Peng
> gets used.  I think there's probably some more refactoring to be done.
> I am not aware of any bugs, but the test suite is pitifully small.
> The current algorithm grows the buckets more aggressively than the table;
> that's probably exactly the wrong thing to do for good performance.
> 
> This version is full of debugging printks.  You should probably take
> them out if you're going to try to benchmark it.  The regex '^print'
> should find them all.  Contributions welcome; I really want to get back
> to working on folios, but this felt like an urgent problem to be fixed.
> 
> [1] I stopped trying to estimate the memory costs for xarray; I couldn't
> be bothered to as it's not a serious competitor for this use case.
> 
> [2] We have ideas for improving the maple tree memory consumption for
> this kind of workload; a new node type for pivots that fit in 4 bytes and
> sparse nodes to avoid putting a NULL entry after each occupied entry.
> The maple tree really is optimised for densely packed ranges at the
> moment.
> 
> Matthew Wilcox (Oracle) (1):
>    rosebush: Add new data structure
> 
>   Documentation/core-api/index.rst    |   1 +
>   Documentation/core-api/rosebush.rst | 135 ++++++
>   MAINTAINERS                         |   8 +
>   include/linux/rosebush.h            |  41 ++
>   lib/Kconfig.debug                   |   3 +
>   lib/Makefile                        |   3 +-
>   lib/rosebush.c                      | 707 ++++++++++++++++++++++++++++
>   lib/test_rosebush.c                 | 135 ++++++
>   8 files changed, 1032 insertions(+), 1 deletion(-)
>   create mode 100644 Documentation/core-api/rosebush.rst
>   create mode 100644 include/linux/rosebush.h
>   create mode 100644 lib/rosebush.c
>   create mode 100644 lib/test_rosebush.c
>