linux-kernel - Re: [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0d8b7948-12b4-4af5-83df-b6998980398a@sangfor.com.cn>
Date:   Wed, 13 Sep 2023 18:52:43 +0800
From:   pengdonglin <pengdonglin@...gfor.com.cn>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Eduard Zingerman <eddyz87@...il.com>
Cc:     Martin KaFai Lau <martin.lau@...ux.dev>,
        Alexei Starovoitov <ast@...nel.org>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>, dinghui@...gfor.com.cn,
        huangcun@...gfor.com.cn, bpf <bpf@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v2] bpf: Using binary search to improve the
 performance of btf_find_by_name_kind

On 2023/9/13 0:40, Alexei Starovoitov wrote:
> On Tue, Sep 12, 2023 at 7:19 AM Eduard Zingerman <eddyz87@...il.com> wrote:
>>
>> On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote:
>>> On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote:
>>>> Currently, we are only using the linear search method to find the type id
>>>> by the name, which has a time complexity of O(n). This change involves
>>>> sorting the names of btf types in ascending order and using binary search,
>>>> which has a time complexity of O(log(n)). This idea was inspired by the
>>>> following patch:
>>>>
>>>> 60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_name()").
>>>>
>>>> At present, this improvement is only for searching in vmlinux's and
>>>> module's BTFs, and the kind should only be BTF_KIND_FUNC or BTF_KIND_STRUCT.
>>>>
>>>> Another change is the search direction, where we search the BTF first and
>>>> then its base, the type id of the first matched btf_type will be returned.
>>>>
>>>> Here is a time-consuming result that finding all the type ids of 67,819 kernel
>>>> functions in vmlinux's BTF by their names:
>>>>
>>>> Before: 17000 ms
>>>> After:     10 ms
>>>>
>>>> The average lookup performance has improved about 1700x at the above scenario.
>>>>
>>>> However, this change will consume more memory, for example, 67,819 kernel
>>>> functions will allocate about 530KB memory.
>>>
>>> Hi Donglin,
>>>
>>> I think this is a good improvement. However, I wonder, why did you
>>> choose to have a separate name map for each BTF kind?
>>>
>>> I did some analysis for my local testing kernel config and got such numbers:
>>> - total number of BTF objects: 97350
>>> - number of FUNC and STRUCT objects: 51597
>>> - number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATASEC objects: 56817
>>>    (these are all kinds for which lookup by name might make sense)
>>> - number of named objects: 54246
>>> - number of name collisions:
>>>    - unique names: 53985 counts
>>>    - 2 objects with the same name: 129 counts
>>>    - 3 objects with the same name: 3 counts
>>>
>>> So, it appears that having a single map for all named objects makes
>>> sense and would also simplify the implementation, what do you think?
>>
>> Some more numbers for my config:
>> - 13241 types (struct, union, typedef, enum), log2 13241 = 13.7
>> - 43575 funcs, log2 43575 = 15.4
>> Thus, having separate map for types vs functions might save ~1.7
>> search iterations. Is this a significant slowdown in practice?
> 
> What do you propose to do in case of duplicates ?
> func and struct can have the same name, but they will have two different
> btf_ids. How do we store them ?
> Also we might add global vars to BTF. Such request came up several times.
> So we need to make sure our search approach scales to
> func, struct, vars. I don't recall whether we search any other kinds.
> Separate arrays for different kinds seems ok.
> It's a bit of code complexity, but it's not an increase in memory.
> With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyte
> extra memory. That's quite a bit. Anything we can do to compress it?
> Folks requested vmlinux BTF to be a module, so it's loaded on demand.
> BTF memory consumption is a concern to many.
> I think before we add these per-kind search arrays we better make
> BTF optional as a module.

I think that making BTF as a module may not have much significance, since
the function bpf_get_btf_vmlinux is invoked in many places, such as during
loading a kernel module.

>