[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7f770a27-6ca6-463f-9145-5c795e0b3f40@oracle.com>
Date: Tue, 14 Oct 2025 09:05:50 +0100
From: Alan Maguire <alan.maguire@...cle.com>
To: Donglin Peng <dolinux.peng@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>,
Andrii Nakryiko <andrii@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-trace-kernel <linux-trace-kernel@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, Eduard Zingerman <eddyz87@...il.com>,
Alexei Starovoitov <ast@...nel.org>, Song Liu <song@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Steven Rostedt
<rostedt@...dmis.org>,
pengdonglin <pengdonglin@...omi.com>
Subject: Re: [RFC PATCH v1] btf: Sort BTF types by name and kind to optimize
btf_find_by_name_kind lookup
On 14/10/2025 05:53, Donglin Peng wrote:
> On Tue, Oct 14, 2025 at 10:48 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
>>
>> On Mon, Oct 13, 2025 at 6:54 PM Donglin Peng <dolinux.peng@...il.com> wrote:
>>>
>>> On Tue, Oct 14, 2025 at 8:22 AM Alexei Starovoitov
>>> <alexei.starovoitov@...il.com> wrote:
>>>>
>>>> On Mon, Oct 13, 2025 at 5:15 PM Andrii Nakryiko
>>>> <andrii.nakryiko@...il.com> wrote:
>>>>>
>>>>> On Mon, Oct 13, 2025 at 4:53 PM Alexei Starovoitov
>>>>> <alexei.starovoitov@...il.com> wrote:
>>>>>>
>>>>>> On Mon, Oct 13, 2025 at 4:40 PM Andrii Nakryiko
>>>>>> <andrii.nakryiko@...il.com> wrote:
>>>>>>>
>>>>>>> Just a few observations (if we decide to do the sorting of BTF by name
>>>>>>> in the kernel):
>>>>>>
>>>>>> iirc we discussed it in the past and decided to do sorting in pahole
>>>>>> and let the kernel verify whether it's sorted or not.
>>>>>> Then no extra memory is needed.
>>>>>> Or was that idea discarded for some reason?
>>>>>
>>>>> Don't really remember at this point, tbh. Pre-sorting should work
>>>>> (though I'd argue that then we should only sort by name to make this
>>>>> sorting universally useful, doing linear search over kinds is fast,
>>>>> IMO). Pre-sorting won't work for program BTFs, don't know how
>>>>> important that is. This indexing on demand approach would be
>>>>> universal. ¯\_(ツ)_/¯
>>>>>
>>>>> Overall, paying 300KB for sorted index for vmlinux BTF for cases where
>>>>> we repeatedly need this seems ok to me, tbh.
>>>>
>>>> If pahole sorting works I don't see why consuming even 300k is ok.
>>>> kallsyms are sorted during the build too.
>>>
>>> Thanks. We did discuss pre-sorting in pahole in the threads:
>>>
>>> https://lore.kernel.org/all/CAADnVQLMHUNE95eBXdy6=+gHoFHRsihmQ75GZvGy-hSuHoaT5A@mail.gmail.com/
>>> https://lore.kernel.org/all/CAEf4BzaXHrjoEWmEcvK62bqKuT3de__+juvGctR3=e8avRWpMQ@mail.gmail.com/
>>>
>>> However, since that approach depends on newer pahole features and
>>> btf_find_by_name_kind is already being called quite frequently, I suggest
>>> we first implement sorting within the kernel, and subsequently add pre-sorting
>>> support in pahole.
>>
>> and then what? Remove it from the kernel when pahole is newer?
>> I'd rather not do this churn in the first place.
>
> Apologies for the formatting issues in my previous email—sending this again
> for clarity.
>
> Thank you for your feedback. Your concerns are completely valid.
>
> I’d like to suggest a dual-mechanism approach:
> 1. If BTF is generated by a newer pahole (with pre-sorting support), the
> kernel would use the pre-sorted data directly.
> 2. For BTF from older pahole versions, the kernel would handle sorting
> at load time or later.
>
> This would provide performance benefits immediately while preserving
> backward compatibility. The kernel-side sorting would remain intact
> moving forward, avoiding future churn.
>
If you're taking the approach of doing both - which is best I think -
I'd suggest it might be helpful to look at the bpf_relocate.c code; it's
shared between libbpf and kernel, so you could potentially add shared
code to do sorting in libbpf (which pahole would use) and the kernel
would use too; this would help ensure the behaviour is identical.
Maybe for pahole/libbpf sorting could be done via a new BTF dedup()
option, since dedup is the time we finalize the BTF representation?
Alan
Powered by blists - more mailing lists