netdev - Re: [RFC PATCH bpf-next 0/2] bpf: Introduce ternary search tree for string key

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a990687-b336-6f44-589b-8bd972882beb@huawei.com>
Date:   Sat, 9 Apr 2022 11:07:56 +0800
From:   Hou Tao <houtao1@...wei.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
CC:     Alexei Starovoitov <ast@...nel.org>, Yonghong Song <yhs@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Song Liu <songliubraving@...com>,
        KP Singh <kpsingh@...nel.org>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 0/2] bpf: Introduce ternary search tree for
 string key

Hi,

On 4/7/2022 1:38 AM, Andrii Nakryiko wrote:
> On Thu, Mar 31, 2022 at 5:04 AM Hou Tao <houtao1@...wei.com> wrote:
>> Hi,
>>
>> The initial motivation for the patchset is due to the suggestion of Alexei.
>> During the discuss of supporting of string key in hash-table, he saw the
>> space efficiency of ternary search tree under our early test and suggest
>> us to post it as a new bpf map [1].
>>
>> Ternary search tree is a special trie where nodes are arranged in a
>> manner similar to binary search tree, but with up to three children
>> rather than two. The three children correpond to nodes whose value is
>> less than, equal to, and greater than the value of current node
>> respectively.
>>
>> In ternary search tree map, only the valid content of string is saved.
>> The trailing null byte and unused bytes after it are not saved. If there
>> are common prefixes between these strings, the prefix is only saved once.
>> Compared with other space optimized trie (e.g. HAT-trie, succinct trie),
>> the advantage of ternary search tree is simple and being writeable.
>>
>> Below are diagrams for ternary search map when inserting hello, he,
>> test and tea into it:
>>
>> 1. insert "hello"
>>
>>         [ hello ]
>>
>> 2. insert "he": need split "hello" into "he" and "llo"
>>
>>          [ he ]
>>             |
>>             *
>>             |
>>          [ llo ]
>>
>> 3. insert "test": add it as right child of "he"
>>
>>          [ he ]
>>             |
>>             *-------x
>>             |       |
>>          [ llo ] [ test ]
>>
>> 5. insert "tea": split "test" into "te" and "st",
>>    and insert "a" as left child of "st"
>>
>>          [ he ]
>>             |
>>      x------*-------x
>>      |      |       |
>>   [ ah ] [ llo ] [ te ]
>>                     |
>>                     *
>>                     |
>>                  [ st ]
>>                     |
>>                x----*
>>                |
>>              [ a ]
>>
>> As showed in above diagrams, the common prefix between "test" and "tea"
>> is "te" and it only is saved once. Also add benchmarks to compare the
>> memory usage and lookup performance between ternary search tree and
>> hash table. When the common prefix is lengthy (~192 bytes) and the
>> length of suffix is about 64 bytes, there are about 2~3 folds memory
>> saving compared with hash table. But the memory saving comes at prices:
>> the lookup performance of tst is about 2~3 slower compared with hash
>> table. See more benchmark details on patch #2.
>>
>> Comments and suggestions are always welcome.
>>
> Have you heard and tried qp-trie ([0]) by any chance? It is elegant
> and simple data structure. By all the available benchmarks it handily
> beats Red-Black trees in terms of memory usage and performance (though
> it of course depends on the data set, just like "memory compression"
> for ternary tree of yours depends on large set of common prefixes).
> qp-trie based BPF map seems (at least on paper) like a better
> general-purpose BPF map that is dynamically sized (avoiding current
> HASHMAP limitations) and stores keys in sorted order (and thus allows
> meaningful ordered iteration *and*, importantly for longest prefix
> match tree, allows efficient prefix matches). I did a quick experiment
> about a month ago trying to replace libbpf's internal use of hashmap
> with qp-trie for BTF string dedup and it was slightly slower than
> hashmap (not surprisingly, though, because libbpf over-sizes hashmap
> to avoid hash collisions and long chains in buckets), but it was still
> very decent even in that scenario. So I've been mulling the idea of
> implementing BPF map based on qp-trie elegant design and ideas, but
> can't find time to do this.
I have heard about it when check the space efficient of HAT trie [0], because
qp-trie needs to save the whole string key in the leaf node and its space
efficiency can not be better than ternary search tree for strings with common
prefix, so I did not consider about it. But I will do some benchmarks to check
the lookup performance and space efficiency of qp-trie and tst for string with
common prefix and strings without much common prefix.
If qp-trie is better, I think I can take the time to post it as a bpf map if you
are OK with that.


>
> This prefix sharing is nice when you have a lot of long common
> prefixes, but I'm a bit skeptical that as a general-purpose BPF data
> structure it's going to be that beneficial. 192 bytes of common
> prefixes seems like a very unusual dataset :)
Yes. The case with common prefix I known is full file path.
> More specifically about TST implementation in your paches. One global
> per-map lock I think is a very big downside. We have LPM trie which is
> very slow in big part due to global lock. It might be possible to
> design more granular schema for TST, but this whole in-place splitting
> logic makes this harder. I think qp-trie can be locked in a granular
> fashion much more easily by having a "hand over hand" locking: lock
> parent, find child, lock child, unlock parent, move into child node.
> Something like that would be more scalable overall, especially if the
> access pattern is not focused on a narrow set of nodes.
Yes. The global lock is a problem but the splitting is not in-place. I will try
to figure out whether the lock can be more scalable after the benchmark test
between qp-trie and tst.

Regards,
Tao

[0]: https://github.com/Tessil/hat-trie
>
> Anyways, I love data structures and this one is an interesting idea.
> But just my few cents of "production-readiness" for general-purpose
> data structures for BPF.
>
>   [0] https://dotat.at/prog/qp/README.html
>
>> Regards,
>> Tao
>>
>> [1]: https://lore.kernel.org/bpf/CAADnVQJUJp3YBcpESwR3Q1U6GS1mBM=Vp-qYuQX7eZOaoLjdUA@mail.gmail.com/
>>
>> Hou Tao (2):
>>   bpf: Introduce ternary search tree for string key
>>   selftests/bpf: add benchmark for ternary search tree map
>>
>>  include/linux/bpf_types.h                     |   1 +
>>  include/uapi/linux/bpf.h                      |   1 +
>>  kernel/bpf/Makefile                           |   1 +
>>  kernel/bpf/bpf_tst.c                          | 411 +++++++++++++++++
>>  tools/include/uapi/linux/bpf.h                |   1 +
>>  tools/testing/selftests/bpf/Makefile          |   5 +-
>>  tools/testing/selftests/bpf/bench.c           |   6 +
>>  .../selftests/bpf/benchs/bench_tst_map.c      | 415 ++++++++++++++++++
>>  .../selftests/bpf/benchs/run_bench_tst.sh     |  54 +++
>>  tools/testing/selftests/bpf/progs/tst_bench.c |  70 +++
>>  10 files changed, 964 insertions(+), 1 deletion(-)
>>  create mode 100644 kernel/bpf/bpf_tst.c
>>  create mode 100644 tools/testing/selftests/bpf/benchs/bench_tst_map.c
>>  create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_tst.sh
>>  create mode 100644 tools/testing/selftests/bpf/progs/tst_bench.c
>>
>> --
>> 2.31.1
>>
> .