[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eaf64a4b-85ab-bb5a-5c33-d7cd80b034e7@meta.com>
Date: Wed, 18 Jan 2023 10:15:26 -0800
From: Yonghong Song <yhs@...a.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Alan Maguire <alan.maguire@...cle.com>
Cc: Menglong Dong <menglong8.dong@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH] libbpf: resolve kernel function name optimization for
kprobe
On 1/13/23 12:00 AM, Yonghong Song wrote:
>
>
> On 1/12/23 1:07 PM, Alexei Starovoitov wrote:
>> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@...cle.com>
>> wrote:
>>>
>>> On 12/01/2023 07:23, Yonghong Song wrote:
>>>>
>>>>
>>>> On 1/9/23 7:11 PM, Menglong Dong wrote:
>>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@...a.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/9/23 1:42 AM, menglong8.dong@...il.com wrote:
>>>>>>> From: Menglong Dong <imagedong@...cent.com>
>>>>>>>
>>>>>>> The function name in kernel may be changed by the compiler. For
>>>>>>> example,
>>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'.
>>>>>>>
>>>>>>> This kind optimization can happen in any kernel function.
>>>>>>> Therefor, we
>>>>>>> should conside this case.
>>>>>>>
>>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can
>>>>>>> lookup the
>>>>>>> kallsyms and check if there is a similar function end with
>>>>>>> '.xxx', and
>>>>>>> retry.
>>>>>>
>>>>>> This might produce incorrect result, so this approach won't work
>>>>>> for all .isra.0 cases. When a function name is changed from
>>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have
>>>>>> make some changes to the arguments, e.g., removing one argument,
>>>>>> chaning a semantics of argument, etc. if bpf program still
>>>>>> uses the original function signature, the bpf program may
>>>>>> produce unexpected result.
>>>>>
>>>>> Oops, I wasn't aware of this part. Can we make this function disabled
>>>>> by default and offer an option to users to enable it? Such as:
>>>>>
>>>>> bpf_object_adapt_sym(struct bpf_object *obj)
>>>>>
>>>>> In my case, kernel function rename is common, and I have to
>>>>> check all functions and do such adaptation before attaching
>>>>> my kprobe programs, which makes me can't use auto-attach.
>>>>>
>>>>> What's more, I haven't seen the arguments change so far, and
>>>>> maybe it's not a common case?
>>>>
>>>> I don't have statistics, but it happens. In general, if you
>>>> want to attach to a function like <foo>, but it has a variant
>>>> <foo>.isra.<num>, you probably should check assembly code
>>>> to ensure the parameter semantics not changed, and then
>>>> you can attach to kprobe function <foo>.isra.<num>, which
>>>> I assume current libbpf infrastructure should support it.
>>>> After you investigate all these <foo>.isra.<num> functions
>>>> and confirm their argument semantics won't change, you
>>>> could use kprobe multi to do attachment.
>>>>
>>>
>>> I crunched some numbers on this, and discovered out of ~1600
>>> .isra/.constprop functions, 76 had a missing argument. The patch series
>>> at [1] is a rough attempt to get pahole to spot these, and add
>>> BTF entries for each, where the BTF representation reflects
>>> reality by skipping optimized-out arguments. So for a function
>>> like
>>>
>>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg,
>>> const struct in6_addr *gw_addr, u32
>>> tbid,
>>> int flags, struct fib6_result *res);
>>>
>>> Examining the BTF representation using pahole from [1], we see
>>>
>>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config
>>> *cfg, struct in6_addr *gw_addr, u32 tbid, int flags);
>>>
>>> Comparing to the definition, we see the last parameter is missing,
>>> i.e. the "struct fib6_result *" argument is missing. The calling
>>> pattern -
>>> where the callers have a struct fib6_result on the stack and pass a
>>> pointer -
>>> is reflected in late DWARF info which shows the argument is not actually
>>> passed as a register, but can be expressed as an offset relative to
>>> the current
>>> function stack (DW_OP_fbreg).
>>>
>>> This approach howvever introduces the problem that currently the kernel
>>> doesn't allow a "." in a function name. We can fix that, but any BTF
>>> encoding
>>> that introduced optimized functions containing a "." would have to
>>> be opt-in
>>> via a pahole option, so we do not generate invalid vmlinux BTF for
>>> kernels
>>> without that change.
>>>
>>> An alternative approach would be to simply encode .isra functions
>>> in BTF without the .isra suffix (i.e. using "function_name" not
>>> "function_name.isra"), only doing the BTF encoding if no arguments were
>>> optimized out - i.e. if the function signature matches expectations.
>>> The 76 functions with optimized-out parameters could simply be skipped.
>>> To me that feels like the simpler approach - it avoids issues
>>> with function name BTF encoding, and with that sort of model a
>>> loose-matching kallsyms approach - like that described here - could
>>> be used
>>> for kprobes and fentry/fexit. It also fits with the DWARF
>>> representation -
>>> the .isra suffixes are not present in DWARF representations of the
>>> function,
>>> only in the symbol table and kallsyms, so perhaps BTF should follow suit
>>> and not add the suffixes. What do you think?
>>
>> Sounds like a great idea to me.
>> Addresses this issue in a clean way.
>
> Yes, the second approach seems a reasonable approach. If the number
> of parameters for the *actual* functions equals to the number
> of parameters for the defined function (abstract_origin),
> we can roughly assume the actual function signature matches
> the prototype. Although it is theoretically possible that
> compiler might change parameter types, e.g., from a
> struct pointer (struct foo *p) to a int value (p->field1).
> But this should be extremely rare and we need compiler emitting
> additional dwarf data (might through btf_decl_tag) to discover
> such cases.
I checked with some compiler guys at Meta. They confirmed that clang
might have the same optimization (eliminating some function parameters
for static functions), but clang won't change function linkage name.
So that means, clang won't do static function cloning and it will
only remove function parameters if this can be applied to all call sites.
I checked the clang (clang16) build kernel with latest bpf-next
and didn't find a single instance that a static function's parameter
is removed.
Also, current libbpf kprobe API supports to pass a function address
to kernel. So user space can always do their own /proc/kallsyms search
and find func address for either regular function or function
with .isa.<num>/.constprop.<num> suffixes.
Powered by blists - more mailing lists