linux-kernel - Re: BTF, pahole and static functions (was Re: [PATCH] libbpf: resolve kernel function name optimization for kprobe)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50c0ac5d-1507-5b9e-6670-8ec654c40559@meta.com>
Date:   Mon, 23 Jan 2023 23:18:23 -0800
From:   Yonghong Song <yhs@...a.com>
To:     Alan Maguire <alan.maguire@...cle.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Menglong Dong <menglong8.dong@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Stanislav Fomichev <sdf@...gle.com>,
        Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
        bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
        Menglong Dong <imagedong@...cent.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: BTF, pahole and static functions (was Re: [PATCH] libbpf: resolve
 kernel function name optimization for kprobe)



On 1/23/23 9:14 AM, Alan Maguire wrote:
> On 21/01/2023 05:29, Yonghong Song wrote:
>>
>>
>> On 1/20/23 5:20 AM, Alan Maguire wrote:
>>> On 18/01/2023 18:15, Yonghong Song wrote:
>>>>
>>>>
>>>> On 1/13/23 12:00 AM, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 1/12/23 1:07 PM, Alexei Starovoitov wrote:
>>>>>> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@...cle.com> wrote:
>>>>>>>
>>>>>>> On 12/01/2023 07:23, Yonghong Song wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/9/23 7:11 PM, Menglong Dong wrote:
>>>>>>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@...a.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1/9/23 1:42 AM, menglong8.dong@...il.com wrote:
>>>>>>>>>>> From: Menglong Dong <imagedong@...cent.com>
>>>>>>>>>>>
>>>>>>>>>>> The function name in kernel may be changed by the compiler. For example,
>>>>>>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'.
>>>>>>>>>>>
>>>>>>>>>>> This kind optimization can happen in any kernel function. Therefor, we
>>>>>>>>>>> should conside this case.
>>>>>>>>>>>
>>>>>>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the
>>>>>>>>>>> kallsyms and check if there is a similar function end with '.xxx', and
>>>>>>>>>>> retry.
>>>>>>>>>>
>>>>>>>>>> This might produce incorrect result, so this approach won't work
>>>>>>>>>> for all .isra.0 cases. When a function name is changed from
>>>>>>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have
>>>>>>>>>> make some changes to the arguments, e.g., removing one argument,
>>>>>>>>>> chaning a semantics of argument, etc. if bpf program still
>>>>>>>>>> uses the original function signature, the bpf program may
>>>>>>>>>> produce unexpected result.
>>>>>>>>>
>>>>>>>>> Oops, I wasn't aware of this part. Can we make this function disabled
>>>>>>>>> by default and offer an option to users to enable it? Such as:
>>>>>>>>>
>>>>>>>>>         bpf_object_adapt_sym(struct bpf_object *obj)
>>>>>>>>>
>>>>>>>>> In my case, kernel function rename is common, and I have to
>>>>>>>>> check all functions and do such adaptation before attaching
>>>>>>>>> my kprobe programs, which makes me can't use auto-attach.
>>>>>>>>>
>>>>>>>>> What's more, I haven't seen the arguments change so far, and
>>>>>>>>> maybe it's not a common case?
>>>>>>>>
>>>>>>>> I don't have statistics, but it happens. In general, if you
>>>>>>>> want to attach to a function like <foo>, but it has a variant
>>>>>>>> <foo>.isra.<num>, you probably should check assembly code
>>>>>>>> to ensure the parameter semantics not changed, and then
>>>>>>>> you can attach to kprobe function <foo>.isra.<num>, which
>>>>>>>> I assume current libbpf infrastructure should support it.
>>>>>>>> After you investigate all these <foo>.isra.<num> functions
>>>>>>>> and confirm their argument semantics won't change, you
>>>>>>>> could use kprobe multi to do attachment.
>>>>>>>>
>>>>>>>
>>>>>>> I crunched some numbers on this, and discovered out of ~1600
>>>>>>> .isra/.constprop functions, 76 had a missing argument. The patch series
>>>>>>> at [1] is a rough attempt to get pahole to spot these, and add
>>>>>>> BTF entries for each, where the BTF representation reflects
>>>>>>> reality by skipping optimized-out arguments. So for a function
>>>>>>> like
>>>>>>>
>>>>>>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg,
>>>>>>>                                   const struct in6_addr *gw_addr, u32 tbid,
>>>>>>>                                   int flags, struct fib6_result *res);
>>>>>>>
>>>>>>> Examining the BTF representation using pahole from [1], we see
>>>>>>>
>>>>>>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags);
>>>>>>>
>>>>>>> Comparing to the definition, we see the last parameter is missing,
>>>>>>> i.e. the "struct fib6_result *" argument is missing. The calling pattern -
>>>>>>> where the callers have a struct fib6_result on the stack and pass a pointer -
>>>>>>> is reflected in late DWARF info which shows the argument is not actually
>>>>>>> passed as a register, but can be expressed as an offset relative to the current
>>>>>>> function stack (DW_OP_fbreg).
>>>>>>>
>>>>>>> This approach howvever introduces the problem that currently the kernel
>>>>>>> doesn't  allow a "." in a function name. We can fix that, but any BTF encoding
>>>>>>> that introduced optimized functions containing a  "." would have to be opt-in
>>>>>>> via a pahole option, so we do not generate invalid vmlinux BTF for kernels
>>>>>>> without that change.
>>>>>>>
>>>>>>> An alternative approach would be to simply encode .isra functions
>>>>>>> in BTF without the .isra suffix (i.e. using "function_name" not
>>>>>>> "function_name.isra"), only doing the BTF encoding if no arguments were
>>>>>>> optimized out - i.e. if the function signature matches expectations.
>>>>>>> The 76 functions with optimized-out parameters could simply be skipped.
>>>>>>> To me that feels like the simpler approach - it avoids issues
>>>>>>> with function name BTF encoding, and with that sort of model a
>>>>>>> loose-matching kallsyms approach - like that described here - could be used
>>>>>>> for kprobes and fentry/fexit. It also fits with the DWARF representation -
>>>>>>> the .isra suffixes are not present in DWARF representations of the function,
>>>>>>> only in the symbol table and kallsyms, so perhaps BTF should follow suit
>>>>>>> and not add the suffixes. What do you think?
>>>>>>
>>>>>> Sounds like a great idea to me.
>>>>>> Addresses this issue in a clean way.
>>>>>
>>>>> Yes, the second approach seems a reasonable approach. If the number
>>>>> of parameters for the *actual* functions equals to the number
>>>>> of parameters for the defined function (abstract_origin),
>>>>> we can roughly assume the actual function signature matches
>>>>> the prototype. Although it is theoretically possible that
>>>>> compiler might change parameter types, e.g., from a
>>>>> struct pointer (struct foo *p) to a int value (p->field1).
>>>>> But this should be extremely rare and we need compiler emitting
>>>>> additional dwarf data (might through btf_decl_tag) to discover
>>>>> such cases.
>>>>
>>>
>>> Thanks! I've prototyped a solution at [1].
>>>
>>> The key problem is pahole processing compilation units separately;
>>> the issue is that for some functions, they have optimized out
>>> parameters in some compilation units but not others (NF_HOOK()
>>> does this for example).  It's a pain, especially as we want to
>>> preserve parallel BTF encoding as much as possible, so the
>>> best solution I could come up with was to save information on
>>> functions that had a suffix match in a global encoder binary
>>> tree.  Then, when we are collecting threads, they can be safely
>>> added prior to BTF merging, since at that point we have
>>> recorded if they have optimized-out parameters in any compilation
>>> unit.  There may be a better way to handle this, but I think
>>> we are stuck with comparing binary-wide to see if the
>>> parameters are consistent.  The code is (I think) careful
>>> to limit this to cases where "."-suffixed functions are found.
>>
>> I agree that for this .isra.<num> issue and later static functions
>> with different signature issue, global view is needed to make
>> proper decision.
>>
>>>
>>> In testing this however, I think there is a wider issue with
>>> BTF encoding of static functions which may require a similar global
>>> comparison mechanism.  More below...
>>>   
>>>> I checked with some compiler guys at Meta. They confirmed that clang
>>>> might have the same optimization (eliminating some function parameters
>>>> for static functions), but clang won't change function linkage name.
>>>> So that means, clang won't do static function cloning and it will
>>>> only remove function parameters if this can be applied to all call sites.
>>>>
>>>
>>> Great, that simplifies things a lot.
>>>
>>>> I checked the clang (clang16) build kernel with latest bpf-next
>>>> and didn't find a single instance that a static function's parameter
>>>> is removed.
>>>>
>>>
>>> Excellent!
>>>
>>>> Also, current libbpf kprobe API supports to pass a function address
>>>> to kernel. So user space can always do their own /proc/kallsyms search
>>>> and find func address for either regular function or function
>>>> with .isa.<num>/.constprop.<num> suffixes.
>>>
>>> Right; one way I've done this is to have a special "okprobe" section
>>> name where we support matching with a "." suffix as well as with
>>> the exact name. Might be worth adding that support to libbpf itself
>>> if the above approach lands.
>>
>> I am not sure about this. We should keep a good default to handle
>> .isra functions as we discussed earlier. If user still wants
>> to kprobe a .isra function which is filtered out by our default
>> handling, user can use addr instead, in which case, user needs
>> to check dwarf output and/or assembly to make sure they understand
>> how many arguments are actually used and how they are used.
> 
> But if we ensure BTF only encodes consistent representations of
> functions and spots optimized-out parameters, wouldn't the presence
> in BTF of an .isra function provide a guarantee parameters are not
> optimized out? Note I'm not suggesting changing "kprobe/" or "fentry/"
> semantics here, but rather having other dedicated SEC() prefixes to
> support "try for foo, but fall back to foo.isra if attach to foo fails,
> and we can find foo in BTF".

We could. I just feel this is a corner case and user have an alternative
to use address directly. This is just my opinion though.

> 
>>
>>>
>>> The bigger concern I have is in testing this I hit a problem which
>>> at first looked like a parallel BTF encoding problem, but on
>>> deeper analysis looks like a conceptual issue in how we handle
>>> static functions.
>>>
>>> To demonstrate, generate vmlinux BTF twice, once with a single thread
>>> and once with 2 threads.  We observe a different number of functions
>>> that end up in BTF for the exact same object:
>>>
>>> $ LLVM_OBJCOPY=objcopy pahole -J -j1 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf
>>> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l
>>> 57596
>>>
>>> $ LLVM_OBJCOPY=objcopy pahole -J -j2 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf
>>> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l
>>> 57714
>>>
>>> So we see 118 more functions in the latter case. Why would this differ? If we sort
>>> and strip out duplicates, we see that the cause is multiple copies of functions:
>>>
>>> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|awk '{print $3}'|sort|uniq|wc -l
>>> 57596
>>>
>>> Once clue is that each encoder maintains a function table of ELF symbols, and
>>> marks a symbol as generated if it has been added to BTF.  As the encoder
>>> traverses CUs, it adds to the encoder symbol table.  In a single-threaded encoding,
>>> we will see only one instance of a function in the final BTF because the first
>>> match of the binary seach of the function list will be returned.  With multiple
>>> BTF encoders however - each with its own internal representation of the symbol
>>> table - there will be multiple instances of functions added to each individual
>>> encoder's BTF fragment.  This is supposed to be okay because BTF deduplication
>>> will handle merging these.
>>>
>>> However, this does not take into account the fact that the same function name
>>> may live in different CUs as a static function with a different function signature.
>>> Could this explain the extra functions?
>>
>> Could be as you explained below. Also, the original behavior for static function with one thread seems wrong. User only find one static function
>> definition in BTF and may assume it is the one user expected. But this may not be the case since other same-name static functions may be
>> simply ignored.
>>
> 
> Right, I should have been clearer on this; the single-threaded case isn't
> right either, it just picks the first instance of a function it finds.
> 
>>>
>>> It seems to account for many of them. To take an example tcp_in_window()
>>> has two representations:
>>>
>>> 122435] FUNC_PROTO '(anon)' ret_type_id=122409 vlen=7
>>>           'ct' type_id=65074
>>>           'dir' type_id=2170
>>>           'index' type_id=6
>>>           'skb' type_id=2012
>>>           'dataoff' type_id=6
>>>           'tcph' type_id=61910
>>>           'hook_state' type_id=29004
>>> [122436] FUNC 'tcp_in_window' type_id=122435 linkage=static
>>>
>>> [66683] FUNC_PROTO '(anon)' ret_type_id=372 vlen=4
>>>           'seq' type_id=23
>>>           'end_seq' type_id=23
>>>           's_win' type_id=23
>>>           'e_win' type_id=23
>>> [66684] FUNC 'tcp_in_window' type_id=66683 linkage=static
>>>
>>> ...one from nf_conntrack_proto_tcp.c, the other from tcp_minisocks.c.
>>>
>>> This raises the conceptual problem - what do we do with such cases?
>>>   From a code perspective, it's totally fine to have conflicting static
>>> functions in different CUs, and one approach would be to retain multiple
>>> conflicting function signatures; however this is not really useful as
>>> there is no mapping from BTF function signature to site.  As a result
>>> we have no way of knowing which signature applies to which function site.
>>> So perhaps the best approach is to eliminate such inconsistent static
>>> function descriptions? The actual amount is small, ~100 functions.
>>
>> Removing these inconsistent static functions could be a simpler
>> approach.
> 
> I took that approach with
> 
> https://github.com/acmel/dwarves/commit/80eaecdb00b3d79becc2133b854593277093b115
> 
> Static functions with inconsistent prototypes are left out of
> BTF encoding. Given that the numbers here are pretty low (around
> 100 or so, not including .isra functions which have inconsistent
> prototypes due to optimizations), that seems to be the simplest
> solution for now.

Yes, I am okay with this approach. This will prevent user mistaking
kprobe a wrong function and this is what we want for now.

> 
>> To really resolve this issue, BTF needs to encode more
>> information, e.g., DW_AT_low_pc, so this function can be related to a particular ksym. We could extend BTF somehow to encode this information.
>> One possible is to utilize the btf_decl_tag. For same-name static
>> functions, we could have
>>      btf_decl_tag("static_func:name:<low_pc_1>") -> foo (first static func in btf)
>>      btf_decl_tag("static_func:name:<low_pc_2>") -> foo (second static func in btf)
>> We only need to do this for static functions which have same names.
>>
>> In verifier, we could build the above relationship and establish
>>     btf_id for first foo -> low_pc_1
>>     btf_id for second foo -> low_pc_2
>> so verifier can then find the correct ksym addr for a particular
>> btf_id for 'foo'.
>>
> 
> It seems workable technically, but I wonder how much benefit it
> provides? The absolute number of static functions with conflicting

As I mentioned in the above, we can do the minimum work in the
beginning to prevent user errors. I agree that it may not be
worthwhile to support those just ~100 static functions with
more BTF types and kernel codes. Users already have alternatives.

> prototypes is small (around 100), and the usability of the above
> would be hard to get right. My suggestion would be that if we
> can ensure that BTF encoding will not encode inconsistent static
> function representations, we could potentially do something like

I assume consistent static function representations means
total match of function signature (return type, func name,
parameter type, parameter name).

> the following on attach:
> 
> - a BPF program specifies "fentry/foo"
> - we find a representation of foo in BTF
> - we notice that there are multiple instances of foo in kallsyms
> - because we know BTF would not have encoded foo if these had
>    inconsistent prototypes, we can safely attach to all those
>    instances

This should work. But if we do find multiple instances of
the same function. I think it would be good if we can spend
some time find out why dedup didn't catch them.

> 
> This seems to provide the least surprise for the user; i.e.
> I attached to a function, and my program fired when it ran. If
> the function has inconsistent representations, it isn't present
> in BTF so we cannot attach safely.
> 
>>>
>>> Assuming that makes sense, the next question is how.  One approach
>>> would be at BTF deduplication time, but we have seen cases in the
>>> past where BTF did not fully deduplicate identical structures, and
>>> in such cases, multiple copies of a function are present which
>>> have multiple (but identical) parameter type descriptions. If this
>>> happens, the danger in eliminating both is we might eliminate
>>> critical kernel functions due to a type deduplication issue. One
>>> way to avoid this would be a comparison of number of parameters
>>> and (failing that) parameter names; such a comparison would not
>>> be subject to issues with identical types leading to identical
>>> function definitions. Not 100% foolproof, but would work in
>>> nearly all cases I suspect.
>>>
>>> Another approach would be to re-use the mechanics introduced to
>>> compare static functions to see if they have optimized-out
>>> parameters to compare function signatures.  So static functions
>>> paradoxically have to be stored in a global tree and compared
>>> to weed out inconsistent duplicates.  Luckily, these comparisons
>>> can be quite superficial; for the most part number of arguments
>>> suffices.
>>
>> If I understand correctly, for same-name static functions,
>> we would like compare # of parameters
>> first. If they are the same, we then compare parameter names.
>> <name>.isra.<num> functions will compare against <name> func
>> to ensure the number of parameters the same. I am not 100%
>> sure what is the difference between the above two approaches.
>> Only implementation difference, right?
>>
> 
> Right, the only question is when we eliminate the inconsistencies
> (prior to BTF dedup or during it).
> 
> We need the mechanics for comparing static functions to handle the .isra
> case, so I went with re-using that scheme to catch inconsistent static
> functions in
> 
> https://github.com/acmel/dwarves/commit/80eaecdb00b3d79becc2133b854593277093b115

I am okay with this mechanism.

> 
> Thanks!
> 
> Alan
> 
>>>
>>> What do you think is the best way forward to solve this problem?
>>> The optimization-handling code gives us a way to deal with this
>>> by postponing addition of such functions until we can guarantee
>>> consistency, and I've roughly prototyped a patch on top of [1]
>>> that works, but I think first we need to figure out what to do
>>> with such inconsistent static functions before determining how we
>>> do it.  What do folks think?
>>>
>>> Alan
>>>
>>> [1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:btf-optimized