lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 26 Feb 2022 19:08:47 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Hou Tao <houtao1@...wei.com>
Cc:     Martin KaFai Lau <kafai@...com>, Yonghong Song <yhs@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Song Liu <songliubraving@...com>,
        John Fastabend <john.fastabend@...il.com>,
        Network Development <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, Joanne Koong <joannekoong@...com>
Subject: Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab

On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@...wei.com> wrote:
>
> For now, our case is a write-once case, so only lookup is considered.
> When data set is bigger than 128KB, hash table has better lookup performance as
> show below:
>
> | lookup all elem (ms) | 4K  | 16K  | 64K  | 128K  | 256K  | 512K  | 1M     | 2M     | 4M      | 7M      |
> | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- |
> | hash                 | 3.3 | 12.7 | 47   | 90.6  | 185.9 | 382.3 | 788.5  | 1622.4 | 3296    | 6248.7  |
> | tries                | 2   | 10   | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 |
> | tst                  | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3  | 19793.8 |

Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1),
but what are the chances that it stays this way?
Are you saying you can size up the table and populate to good % just once?

If so it's probably better to replace all strings with something
like a good hash.
7M elements is not a lot. A hash producing 8 or 16 bytes will have close
to zero false positives.
And in case of "populate the table once" the hash seed can be
precomputed and adjusted, so you can guarantee zero collisions
for 7M strings. While lookup part can still have 0.001% chance
of a false positive there could be a way to deal with it after lookup.

> Ternary search tree always has better memory usage:
>
> | memory usage (MB) | 4K  | 16K | 64K  | 128K | 256K | 512K | 1M   | 2M    | 4M    | 7M     |
> | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ |
> | hash              | 2.2 | 8.9 | 35.5 | 71   | 142  | 284  | 568  | 1136  | 2272  | 4302.5 |
> | tries             | 2.1 | 8.5 | 34   | 68   | 136  | 272  | 544  | 1088  | 2176  | 4106.9 |
> | tst               | 0.5 | 1.6 | 5.6  | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5  |
>

Ternary search tree looks amazing.
Since you have a prototype can you wrap it into a new type of bpf map
and post the patches?
I wonder what data structures look like to achieve such memory efficiency.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ