[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <03e16727-7a0a-0e1e-e9b9-c947a64becb9@meta.com>
Date: Fri, 13 Jan 2023 00:00:55 -0800
From: Yonghong Song <yhs@...a.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Alan Maguire <alan.maguire@...cle.com>
Cc: Menglong Dong <menglong8.dong@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH] libbpf: resolve kernel function name optimization for
kprobe
On 1/12/23 1:07 PM, Alexei Starovoitov wrote:
> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@...cle.com> wrote:
>>
>> On 12/01/2023 07:23, Yonghong Song wrote:
>>>
>>>
>>> On 1/9/23 7:11 PM, Menglong Dong wrote:
>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@...a.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 1/9/23 1:42 AM, menglong8.dong@...il.com wrote:
>>>>>> From: Menglong Dong <imagedong@...cent.com>
>>>>>>
>>>>>> The function name in kernel may be changed by the compiler. For example,
>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'.
>>>>>>
>>>>>> This kind optimization can happen in any kernel function. Therefor, we
>>>>>> should conside this case.
>>>>>>
>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the
>>>>>> kallsyms and check if there is a similar function end with '.xxx', and
>>>>>> retry.
>>>>>
>>>>> This might produce incorrect result, so this approach won't work
>>>>> for all .isra.0 cases. When a function name is changed from
>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have
>>>>> make some changes to the arguments, e.g., removing one argument,
>>>>> chaning a semantics of argument, etc. if bpf program still
>>>>> uses the original function signature, the bpf program may
>>>>> produce unexpected result.
>>>>
>>>> Oops, I wasn't aware of this part. Can we make this function disabled
>>>> by default and offer an option to users to enable it? Such as:
>>>>
>>>> bpf_object_adapt_sym(struct bpf_object *obj)
>>>>
>>>> In my case, kernel function rename is common, and I have to
>>>> check all functions and do such adaptation before attaching
>>>> my kprobe programs, which makes me can't use auto-attach.
>>>>
>>>> What's more, I haven't seen the arguments change so far, and
>>>> maybe it's not a common case?
>>>
>>> I don't have statistics, but it happens. In general, if you
>>> want to attach to a function like <foo>, but it has a variant
>>> <foo>.isra.<num>, you probably should check assembly code
>>> to ensure the parameter semantics not changed, and then
>>> you can attach to kprobe function <foo>.isra.<num>, which
>>> I assume current libbpf infrastructure should support it.
>>> After you investigate all these <foo>.isra.<num> functions
>>> and confirm their argument semantics won't change, you
>>> could use kprobe multi to do attachment.
>>>
>>
>> I crunched some numbers on this, and discovered out of ~1600
>> .isra/.constprop functions, 76 had a missing argument. The patch series
>> at [1] is a rough attempt to get pahole to spot these, and add
>> BTF entries for each, where the BTF representation reflects
>> reality by skipping optimized-out arguments. So for a function
>> like
>>
>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg,
>> const struct in6_addr *gw_addr, u32 tbid,
>> int flags, struct fib6_result *res);
>>
>> Examining the BTF representation using pahole from [1], we see
>>
>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags);
>>
>> Comparing to the definition, we see the last parameter is missing,
>> i.e. the "struct fib6_result *" argument is missing. The calling pattern -
>> where the callers have a struct fib6_result on the stack and pass a pointer -
>> is reflected in late DWARF info which shows the argument is not actually
>> passed as a register, but can be expressed as an offset relative to the current
>> function stack (DW_OP_fbreg).
>>
>> This approach howvever introduces the problem that currently the kernel
>> doesn't allow a "." in a function name. We can fix that, but any BTF encoding
>> that introduced optimized functions containing a "." would have to be opt-in
>> via a pahole option, so we do not generate invalid vmlinux BTF for kernels
>> without that change.
>>
>> An alternative approach would be to simply encode .isra functions
>> in BTF without the .isra suffix (i.e. using "function_name" not
>> "function_name.isra"), only doing the BTF encoding if no arguments were
>> optimized out - i.e. if the function signature matches expectations.
>> The 76 functions with optimized-out parameters could simply be skipped.
>> To me that feels like the simpler approach - it avoids issues
>> with function name BTF encoding, and with that sort of model a
>> loose-matching kallsyms approach - like that described here - could be used
>> for kprobes and fentry/fexit. It also fits with the DWARF representation -
>> the .isra suffixes are not present in DWARF representations of the function,
>> only in the symbol table and kallsyms, so perhaps BTF should follow suit
>> and not add the suffixes. What do you think?
>
> Sounds like a great idea to me.
> Addresses this issue in a clean way.
Yes, the second approach seems a reasonable approach. If the number
of parameters for the *actual* functions equals to the number
of parameters for the defined function (abstract_origin),
we can roughly assume the actual function signature matches
the prototype. Although it is theoretically possible that
compiler might change parameter types, e.g., from a
struct pointer (struct foo *p) to a int value (p->field1).
But this should be extremely rare and we need compiler emitting
additional dwarf data (might through btf_decl_tag) to discover
such cases.
Powered by blists - more mailing lists