[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAErzpmvP41CNQhRVKuDU23xnBKjj239R6_e5K8DSwcNDo7GG5Q@mail.gmail.com>
Date: Sat, 22 Nov 2025 15:19:53 +0800
From: Donglin Peng <dolinux.peng@...il.com>
To: Eduard Zingerman <eddyz87@...il.com>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>, ast@...nel.org, zhangxiaoqin@...omi.com,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
Donglin Peng <pengdonglin@...omi.com>, Alan Maguire <alan.maguire@...cle.com>,
Song Liu <song@...nel.org>
Subject: Re: [RFC PATCH v7 5/7] libbpf: Implement BTF type sorting validation
for binary search optimization
On Sat, Nov 22, 2025 at 3:07 AM Eduard Zingerman <eddyz87@...il.com> wrote:
>
> On Thu, 2025-11-20 at 15:25 +0800, Donglin Peng wrote:
> > On Thu, Nov 20, 2025 at 3:50 AM Andrii Nakryiko
> > <andrii.nakryiko@...il.com> wrote:
> > >
> > > On Tue, Nov 18, 2025 at 7:21 PM Donglin Peng <dolinux.peng@...il.com> wrote:
> > > >
> > > > From: Donglin Peng <pengdonglin@...omi.com>
> > > >
> > > > This patch adds validation to verify BTF type name sorting, enabling
> > > > binary search optimization for lookups.
> > > >
> > > > Cc: Eduard Zingerman <eddyz87@...il.com>
> > > > Cc: Alexei Starovoitov <ast@...nel.org>
> > > > Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>
> > > > Cc: Alan Maguire <alan.maguire@...cle.com>
> > > > Cc: Song Liu <song@...nel.org>
> > > > Cc: Xiaoqin Zhang <zhangxiaoqin@...omi.com>
> > > > Signed-off-by: Donglin Peng <pengdonglin@...omi.com>
> > > > ---
> > > > tools/lib/bpf/btf.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 59 insertions(+)
> > > >
> > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > > index 1d19d95da1d0..d872abff42e1 100644
> > > > --- a/tools/lib/bpf/btf.c
> > > > +++ b/tools/lib/bpf/btf.c
> > > > @@ -903,6 +903,64 @@ int btf__resolve_type(const struct btf *btf, __u32 type_id)
> > > > return type_id;
> > > > }
> > > >
> > > > +/* Anonymous types (with empty names) are considered greater than named types
> > > > + * and are sorted after them. Two anonymous types are considered equal. Named
> > > > + * types are compared lexicographically.
> > > > + */
> > > > +static int btf_compare_type_names(const void *a, const void *b, void *priv)
> > > > +{
> > > > + struct btf *btf = (struct btf *)priv;
> > > > + struct btf_type *ta = btf_type_by_id(btf, *(__u32 *)a);
> > > > + struct btf_type *tb = btf_type_by_id(btf, *(__u32 *)b);
> > > > + const char *na, *nb;
> > > > + bool anon_a, anon_b;
> > > > +
> > > > + na = btf__str_by_offset(btf, ta->name_off);
> > > > + nb = btf__str_by_offset(btf, tb->name_off);
> > > > + anon_a = str_is_empty(na);
> > > > + anon_b = str_is_empty(nb);
> > > > +
> > > > + if (anon_a && !anon_b)
> > > > + return 1;
> > > > + if (!anon_a && anon_b)
> > > > + return -1;
> > > > + if (anon_a && anon_b)
> > > > + return 0;
> > >
> > > any reason to hard-code that anonymous types should come *after* named
> > > ones? That requires custom comparison logic here and resolve_btfids,
> > > instead of just relying on btf__str_by_offset() returning valid empty
> > > string for name_off == 0 and then sorting anon types before named
> > > ones, following normal lexicographical sorting rules?
> >
> > Thanks. I found that some kernel functions like btf_find_next_decl_tag,
> > bpf_core_add_cands, find_bpffs_btf_enums, and find_btf_percpu_datasec
> > still use linear search.
>
> - btf_find_next_decl_tag() - this function is called from:
> - btf_find_decl_tag_value(), here whole scan over all BTF types is
> guaranteed to happen (because btf_find_next_decl_tag() is called
> twice);
> - btf_prepare_func_args(), here again whole scan is guaranteed to
> happen, because of the while loop starting from id == 0.
Right.
> - bpf_core_add_cands() this function is called from
> bpf_core_find_cands(), where it does a linear scan over all types in
> kernel BTF and then a linear scan over all types in module BTFs.
> (Because of how targ_start_id parameter is passed).
Right.
> - find_bpffs_btf_enums() - this function does a linear scan over all
> types in module BTFs.
I think putting names ahead is helpful here, because there is a check
(info->cmd_t && info->map_t && info->prog_t && info->attach_t) to
return early. but I think it can be converted to use btf_find_by_name_kind.
> - find_btf_percpu_datasec() - this function looks for a DATASEC with
> name ".data..percpu" and returns as soon as the match is found.
>
> Of the 4 functions above only find_btf_percpu_datasec() will return
> early if BTF type with specified name is found. And it can be
> converted to use btf_find_by_name_kind().
Thanks. I’ve looked into find_btf_percpu_datasec and we can’t use
btf_find_by_name_kind here because the search scope differs. For
a module BTF, find_btf_percpu_datasec only searches within the
module’s own BTF, whereas btf_find_by_name_kind prioritizes
searching the base BTF first. Thus, placing named types ahead is
more effective here. Besides, I found that the '.data..percpu' named
type will be placed at [1] for vmlinux BTF because the prefix '.' is
smaller than any letter, so the linear search only requires one loop to
locate it. However, if we put named types at the end, it will need more
than 60,000 loops..
>
> So, it appears that there should not be any performance penalty
> (compared to current state of affairs) if anonymous types are put in
> front. Wdyt?
>
> > Putting named types first would also help here, as
> > it allows anonymous types to be skipped naturally during the search.
> > Some of them could be refactored to use btf_find_by_name_kind, but some
> > would not be appropriate, such as btf_find_next_decl_tag,
> > bpf_core_add_cands,find_btf_percpu_datasec.
>
> Did you observe any performance issue if anonymous types are put in
> the front?
No, I have not done such a test, but based on the above analysis, the
improvement
mainly comes from find_bpffs_btf_enums and find_btf_percpu_datasec.
>
> > Additionally, in the linear search branch, I saw there is a NULL check for
> > the name returned by btf__name_by_offset. This suggests that checking
> > name_off == 0 alone may not be sufficient to identify an anonymous type,
> > which is why I used str_is_empty for a more robust check.
> >
> > >
> > > [...]
Powered by blists - more mailing lists