netdev - Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQJUJp3YBcpESwR3Q1U6GS1mBM=Vp-qYuQX7eZOaoLjdUA@mail.gmail.com>
Date:   Sat, 26 Feb 2022 19:08:47 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Hou Tao <houtao1@...wei.com>
Cc:     Martin KaFai Lau <kafai@...com>, Yonghong Song <yhs@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Song Liu <songliubraving@...com>,
        John Fastabend <john.fastabend@...il.com>,
        Network Development <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, Joanne Koong <joannekoong@...com>
Subject: Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab

On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@...wei.com> wrote:
>
> For now, our case is a write-once case, so only lookup is considered.
> When data set is bigger than 128KB, hash table has better lookup performance as
> show below:
>
> | lookup all elem (ms) | 4K  | 16K  | 64K  | 128K  | 256K  | 512K  | 1M     | 2M     | 4M      | 7M      |
> | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- |
> | hash                 | 3.3 | 12.7 | 47   | 90.6  | 185.9 | 382.3 | 788.5  | 1622.4 | 3296    | 6248.7  |
> | tries                | 2   | 10   | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 |
> | tst                  | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3  | 19793.8 |

Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1),
but what are the chances that it stays this way?
Are you saying you can size up the table and populate to good % just once?

If so it's probably better to replace all strings with something
like a good hash.
7M elements is not a lot. A hash producing 8 or 16 bytes will have close
to zero false positives.
And in case of "populate the table once" the hash seed can be
precomputed and adjusted, so you can guarantee zero collisions
for 7M strings. While lookup part can still have 0.001% chance
of a false positive there could be a way to deal with it after lookup.

> Ternary search tree always has better memory usage:
>
> | memory usage (MB) | 4K  | 16K | 64K  | 128K | 256K | 512K | 1M   | 2M    | 4M    | 7M     |
> | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ |
> | hash              | 2.2 | 8.9 | 35.5 | 71   | 142  | 284  | 568  | 1136  | 2272  | 4302.5 |
> | tries             | 2.1 | 8.5 | 34   | 68   | 136  | 272  | 544  | 1088  | 2176  | 4106.9 |
> | tst               | 0.5 | 1.6 | 5.6  | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5  |
>

Ternary search tree looks amazing.
Since you have a prototype can you wrap it into a new type of bpf map
and post the patches?
I wonder what data structures look like to achieve such memory efficiency.