lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sun, 17 Sep 2023 17:07:47 +0800
From:   pengdonglin <pengdonglin@...gfor.com.cn>
To:     Alan Maguire <alan.maguire@...cle.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Eduard Zingerman <eddyz87@...il.com>
Cc:     Martin KaFai Lau <martin.lau@...ux.dev>,
        Alexei Starovoitov <ast@...nel.org>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>, dinghui@...gfor.com.cn,
        huangcun@...gfor.com.cn, bpf <bpf@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v2] bpf: Using binary search to improve the
 performance of btf_find_by_name_kind

On 2023/9/15 1:14, Alan Maguire wrote:
> On 14/09/2023 14:05, pengdonglin wrote:
>> On 2023/9/14 20:46, Alan Maguire wrote:
>>> On 14/09/2023 11:13, pengdonglin wrote:
>>>> On 2023/9/13 21:34, Alan Maguire wrote:
>>>>> On 13/09/2023 11:32, pengdonglin wrote:
>>>>>> On 2023/9/13 2:46, Alexei Starovoitov wrote:
>>>>>>> On Tue, Sep 12, 2023 at 10:03 AM Eduard Zingerman <eddyz87@...il.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Tue, 2023-09-12 at 09:40 -0700, Alexei Starovoitov wrote:
>>>>>>>>> On Tue, Sep 12, 2023 at 7:19 AM Eduard Zingerman
>>>>>>>>> <eddyz87@...il.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote:
>>>>>>>>>>> On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote:
>>>>>>>>>>>> Currently, we are only using the linear search method to find
>>>>>>>>>>>> the
>>>>>>>>>>>> type id
>>>>>>>>>>>> by the name, which has a time complexity of O(n). This change
>>>>>>>>>>>> involves
>>>>>>>>>>>> sorting the names of btf types in ascending order and using
>>>>>>>>>>>> binary search,
>>>>>>>>>>>> which has a time complexity of O(log(n)). This idea was inspired
>>>>>>>>>>>> by the
>>>>>>>>>>>> following patch:
>>>>>>>>>>>>
>>>>>>>>>>>> 60443c88f3a8 ("kallsyms: Improve the performance of
>>>>>>>>>>>> kallsyms_lookup_name()").
>>>>>>>>>>>>
>>>>>>>>>>>> At present, this improvement is only for searching in vmlinux's
>>>>>>>>>>>> and
>>>>>>>>>>>> module's BTFs, and the kind should only be BTF_KIND_FUNC or
>>>>>>>>>>>> BTF_KIND_STRUCT.
>>>>>>>>>>>>
>>>>>>>>>>>> Another change is the search direction, where we search the BTF
>>>>>>>>>>>> first and
>>>>>>>>>>>> then its base, the type id of the first matched btf_type will be
>>>>>>>>>>>> returned.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is a time-consuming result that finding all the type ids of
>>>>>>>>>>>> 67,819 kernel
>>>>>>>>>>>> functions in vmlinux's BTF by their names:
>>>>>>>>>>>>
>>>>>>>>>>>> Before: 17000 ms
>>>>>>>>>>>> After:     10 ms
>>>>>>>>>>>>
>>>>>>>>>>>> The average lookup performance has improved about 1700x at the
>>>>>>>>>>>> above scenario.
>>>>>>>>>>>>
>>>>>>>>>>>> However, this change will consume more memory, for example,
>>>>>>>>>>>> 67,819 kernel
>>>>>>>>>>>> functions will allocate about 530KB memory.
>>>>>>>>>>>
>>>>>>>>>>> Hi Donglin,
>>>>>>>>>>>
>>>>>>>>>>> I think this is a good improvement. However, I wonder, why did
>>>>>>>>>>> you
>>>>>>>>>>> choose to have a separate name map for each BTF kind?
>>>>>>>>>>>
>>>>>>>>>>> I did some analysis for my local testing kernel config and got
>>>>>>>>>>> such numbers:
>>>>>>>>>>> - total number of BTF objects: 97350
>>>>>>>>>>> - number of FUNC and STRUCT objects: 51597
>>>>>>>>>>> - number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATASEC
>>>>>>>>>>> objects: 56817
>>>>>>>>>>>       (these are all kinds for which lookup by name might make
>>>>>>>>>>> sense)
>>>>>>>>>>> - number of named objects: 54246
>>>>>>>>>>> - number of name collisions:
>>>>>>>>>>>       - unique names: 53985 counts
>>>>>>>>>>>       - 2 objects with the same name: 129 counts
>>>>>>>>>>>       - 3 objects with the same name: 3 counts
>>>>>>>>>>>
>>>>>>>>>>> So, it appears that having a single map for all named objects
>>>>>>>>>>> makes
>>>>>>>>>>> sense and would also simplify the implementation, what do you
>>>>>>>>>>> think?
>>>>>>>>>>
>>>>>>>>>> Some more numbers for my config:
>>>>>>>>>> - 13241 types (struct, union, typedef, enum), log2 13241 = 13.7
>>>>>>>>>> - 43575 funcs, log2 43575 = 15.4
>>>>>>>>>> Thus, having separate map for types vs functions might save ~1.7
>>>>>>>>>> search iterations. Is this a significant slowdown in practice?
>>>>>>>>>
>>>>>>>>> What do you propose to do in case of duplicates ?
>>>>>>>>> func and struct can have the same name, but they will have two
>>>>>>>>> different
>>>>>>>>> btf_ids. How do we store them ?
>>>>>>>>> Also we might add global vars to BTF. Such request came up several
>>>>>>>>> times.
>>>>>>>>> So we need to make sure our search approach scales to
>>>>>>>>> func, struct, vars. I don't recall whether we search any other
>>>>>>>>> kinds.
>>>>>>>>> Separate arrays for different kinds seems ok.
>>>>>>>>> It's a bit of code complexity, but it's not an increase in memory.
>>>>>>>>
>>>>>>>> Binary search gives, say, lowest index of a thing with name A, then
>>>>>>>> increment index while name remains A looking for correct kind.
>>>>>>>> Given the name conflicts info from above, 99% of times there
>>>>>>>> would be
>>>>>>>> no need to iterate and in very few cases there would a couple of
>>>>>>>> iterations.
>>>>>>>>
>>>>>>>> Same logic would be necessary with current approach if different BTF
>>>>>>>> kinds would be allowed in BTF_ID_NAME_* cohorts. I figured that
>>>>>>>> these
>>>>>>>> cohorts are mainly a way to split the tree for faster lookups, but
>>>>>>>> maybe that is not the main intent.
>>>>>>>>
>>>>>>>>> With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyte
>>>>>>>>> extra memory. That's quite a bit. Anything we can do to compress
>>>>>>>>> it?
>>>>>>>>
>>>>>>>> That's an interesting question, from the top of my head:
>>>>>>>> pre-sort in pahole (re-assign IDs so that increasing ID also would
>>>>>>>> mean "increasing" name), shouldn't be that difficult.
>>>>>>>
>>>>>>> That sounds great. kallsyms are pre-sorted at build time.
>>>>>>> We should do the same with BTF.
>>>>>>> I think GCC can emit BTF directly now and LLVM emits it for bpf progs
>>>>>>> too,
>>>>>>> but since vmlinux and kernel module BTFs will keep being processed
>>>>>>> through pahole we don't have to make gcc/llvm sort things right away.
>>>>>>> pahole will be enough. The kernel might do 'is it sorted' check
>>>>>>> during BTF validation and then use binary search or fall back to
>>>>>>> linear
>>>>>>> when not-sorted == old pahole.
>>>>>>>
>>>>>>
>>>>>> Yeah, I agree and will attempt to modify the pahole and perform a
>>>>>> test.
>>>>>> Do we need
>>>>>> to introduce a new macro to control the behavior when the BTF is not
>>>>>> sorted? If
>>>>>> it is not sorted, we can use the method mentioned in this patch or use
>>>>>> linear
>>>>>> search.
>>>>>>
>>>>>>
>>>>>
>>>>> One challenge with pahole is that it often runs in parallel mode, so I
>>>>> suspect any sorting would have to be done after merging across threads.
>>>>> Perhaps BTF deduplication time might be a useful time to re-sort by
>>>>> name? BTF dedup happens after BTF has been merged, and a new "sorted"
>>>>> btf_dedup_opts option could be added and controlled by a pahole
>>>>> option. However dedup is pretty complicated already..
>>>>>
>>>>> One thing we should weigh up though is if there are benefits to the
>>>>> way BTF is currently laid out. It tends to start with base types,
>>>>> and often-encountered types end up being located towards the start
>>>>> of the BTF data. For example
>>>>>
>>>>>
>>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
>>>>> encoding=(none)
>>>>> [2] CONST '(anon)' type_id=1
>>>>> [3] VOLATILE '(anon)' type_id=1
>>>>> [4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
>>>>> [5] PTR '(anon)' type_id=8
>>>>> [6] CONST '(anon)' type_id=5
>>>>> [7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
>>>>> [8] CONST '(anon)' type_id=7
>>>>> [9] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
>>>>> [10] CONST '(anon)' type_id=9
>>>>> [11] TYPEDEF '__s8' type_id=12
>>>>> [12] INT 'signed char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
>>>>> [13] TYPEDEF '__u8' type_id=14
>>>>>
>>>>> So often-used types will be found quickly, even under linear search
>>>>> conditions.
>>>>
>>>> I found that there seems to be no code in the kernel that get the ID
>>>> of the
>>>> basic data type by calling btf_find_by_name_kind directly. The general
>>>> usage
>>>> of this function is to obtain the ID of a structure or function. After
>>>> we got
>>>> the ID of a structure or function, it is O(1) to get the IDs of its
>>>> members
>>>> or parameters.
>>>>
>>>> ./kernel/trace/trace_probe.c:383:       id = btf_find_by_name_kind(btf,
>>>> funcname, BTF_KIND_FUNC);
>>>> ./kernel/bpf/btf.c:3523:        id = btf_find_by_name_kind(btf,
>>>> value_type, BTF_KIND_STRUCT);
>>>> ./kernel/bpf/btf.c:5504:                id = btf_find_by_name_kind(btf,
>>>> alloc_obj_fields[i], BTF_KIND_STRUCT);
>>>> ./kernel/bpf/bpf_struct_ops.c:128:      module_id =
>>>> btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT);
>>>> ./net/ipv4/bpf_tcp_ca.c:28:     type_id = btf_find_by_name_kind(btf,
>>>> "sock", BTF_KIND_STRUCT);
>>>> ./net/ipv4/bpf_tcp_ca.c:33:     type_id = btf_find_by_name_kind(btf,
>>>> "tcp_sock", BTF_KIND_STRUCT);
>>>> ./net/netfilter/nf_bpf_link.c:181:      type_id =
>>>> btf_find_by_name_kind(btf, name, BTF_KIND_STRUCT);
>>>>
>>>>>
>>>>> When we look at how many lookups by id (which are O(1), since they are
>>>>> done via the btf->types[] array) versus by name, we see:
>>>>>
>>>>> $ grep btf_type_by_id kernel/bpf/*.c|wc -l
>>>>> 120
>>>>> $ grep btf_find_by_nam kernel/bpf/*.c|wc -l
>>>>> 15
>>>>>
>>>>> I don't see a huge number of name-based lookups, and I think most are
>>>>> outside of the hotter codepaths, unless I'm missing some. All of which
>>>>> is to say it would be a good idea to have a clear sense of what will
>>>>> get
>>>>> faster with sorted-by-name BTF. Thanks!
>>>>
>>>> The story goes like this.
>>>>
>>>> I have added a new feature to the function graph called
>>>> "funcgraph_retval",
>>>> here is the link:
>>>>
>>>> https://lore.kernel.org/all/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn/
>>>>
>>>> We can obtain the return values of almost every function during the
>>>> execution
>>>> of kernel through this feature, it can help us analyze problems.
>>>>
>>>
>>> It's a great feature!
>>
>> Thanks.
>>
>>>
>>>> However, this feature has two main drawbacks.
>>>>
>>>> 1. Even if a function's return type is void,  a return value will still
>>>> be printed.
>>>>
>>>> 2. The return value printed may be incorrect when the width of the
>>>> return type is
>>>> smaller than the generic register.
>>>>
>>>> I think if we can get this return type of the function, then the
>>>> drawbacks mentioned
>>>> above can be eliminated. The function btf_find_by_name_kind can be used
>>>> to get the ID of
>>>> the kernel function, then we can get its return type easily. If the
>>>> return type is
>>>> void, the return value recorded will not be printed. If the width of the
>>>> return type
>>>> is smaller than the generic register, then the value stored in the upper
>>>> bits will be
>>>> trimmed. I have written a demo and these drawbacks were resolved.
>>>>
>>>> However, during my test, I found that it took too much time when read
>>>> the trace log
>>>> with this feature enabled, because the trace log consists of 200,000
>>>> lines. The
>>>> majority of the time was consumed by the btf_find_by_name_kind, which is
>>>> called
>>>> 200,000 times.
>>>>
>>>> So I think the performance of btf_find_by_name_kind  may need to be
>>>> improved.
>>>>
>>>
>>> If I recall, Masami's work uses BTF ids, but can cache them since the
>>> user explicitly asks for specific fields in the trace output. I'm
>>> presuming that's not an option for you due to the fact funcgraph tracing
>>> enables everything (or at least everything under a filter predicate) and
>>> you have limited context to work with, is that right?
>>
>> Yes, right.
>>
>>>
>>> Looking at print_graph_entry_leaf() which I _think_ is where you'd need
>>> to print the retval from, you have access to the function address via
>>> call->func, and I presume you get the name from snprinting the symbol to
>>> a string or similar. So you're stuck in a context where you have the
>>> function address, and from that you can derive the function name. Is
>>> that correct? Thanks!
>>
>> Yes, both print_graph_return and print_graph_entry_leaf will call
>> print_graph_retval
>> to print the return value. Then call sprint_symbol_no_offset with
>> call->func to get
>> the function name, then call btf_find_by_name_kind to get the return type.
>>
> 
> So what you ultimately need is a way to map from that information
> available to be able to figure out the size of the return value
> associated with a function.
> 
> On the BPF side we've been thinking a bit about the relationship between
> kernel function addresses and their BTF representations; sometimes
> knowing BTF->address mapping is needed for cases where the same function
> name has multiple inconsistent function signatures in BTF. We don't
> represent function addresses yet in BTF but may end up having to.
> The reason I mention this is in an ideal world, it would benefit to
> populate kallsyms entries with their associated BTF ids; then a
> function would need to be looked up once in kallsyms; that lookup would
> benefit from recent speedups, and if it contained the associated BTF id
> we'd have an O(1) lookup from the BTF id -> function. Not sure if that
> would be tenable from the kallsyms side, but I just wanted to mention
> it, as from the above it seems an address-based lookup is a possibility
> to solve the return value type lookup problem that you're facing.
> 
> Cc'ed Jiri who had to wrestle with kallsyms for the kprobe multi stuff.
> Would the above help do you think?

Thank you, but I tend to agree with Alexei. Using kallsyms will consume extra
memory too, but it can only be used for functions. I have done a test using
kallsyms_lookup + optimized  btf_find_by_name_kind, and the time consumed
was not excessive, it took about 38ms to find the IDs of 67823 kernel
functions. If using a new map for IDs and function addresses, it took
about 5ms.

> 
> Thanks!
> 
> Alan
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ