[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <694E52E0-C9E5-49D2-A677-09A5EE442590@zytor.com>
Date: Wed, 05 Mar 2025 00:29:45 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Menglong Dong <menglong8.dong@...il.com>,
Peter Zijlstra <peterz@...radead.org>
CC: rostedt@...dmis.org, mark.rutland@....com, alexei.starovoitov@...il.com,
catalin.marinas@....com, will@...nel.org, mhiramat@...nel.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev,
eddyz87@...il.com, yonghong.song@...ux.dev, john.fastabend@...il.com,
kpsingh@...nel.org, sdf@...ichev.me, jolsa@...nel.org,
davem@...emloft.net, dsahern@...nel.org,
mathieu.desnoyers@...icios.com, nathan@...nel.org,
nick.desaulniers+lkml@...il.com, morbo@...gle.com,
samitolvanen@...gle.com, kees@...nel.org, dongml2@...natelecom.cn,
akpm@...ux-foundation.org, riel@...riel.com, rppt@...nel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org,
netdev@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [PATCH v4 1/4] x86/ibt: factor out cfi and fineibt offset
On March 4, 2025 5:19:09 PM PST, Menglong Dong <menglong8.dong@...il.com> wrote:
>On Tue, Mar 4, 2025 at 10:53 PM H. Peter Anvin <hpa@...or.com> wrote:
>>
>> On March 4, 2025 1:42:20 AM PST, Peter Zijlstra <peterz@...radead.org> wrote:
>> >On Tue, Mar 04, 2025 at 03:47:45PM +0800, Menglong Dong wrote:
>> >> We don't have to select FUNCTION_ALIGNMENT_32B, so the
>> >> worst case is to increase ~2.2%.
>> >>
>> >> What do you think?
>> >
>> >Well, since I don't understand what you need this for at all, I'm firmly
>> >on the side of not doing this.
>> >
>> >What actual problem is being solved with this meta data nonsense? Why is
>> >it worth blowing up our I$ footprint over.
>> >
>> >Also note, that if you're going to be explaining this, start from
>> >scratch, as I have absolutely 0 clues about BPF and such.
>>
>> I would appreciate such information as well. The idea seems dubious on the surface.
>
>Ok, let me explain it from the beginning. (My English is not good,
>but I'll try to describe it as clear as possible :/)
>
>Many BPF program types need to depend on the BPF trampoline,
>such as BPF_PROG_TYPE_TRACING, BPF_PROG_TYPE_EXT,
>BPF_PROG_TYPE_LSM, etc. BPF trampoline is a bridge between
>the kernel (or bpf) function and BPF program, and it acts just like the
>trampoline that ftrace uses.
>
>Generally speaking, it is used to hook a function, just like what ftrace
>do:
>
>foo:
> endbr
> nop5 --> call trampoline_foo
> xxxx
>
>In short, the trampoline_foo can be this:
>
>trampoline_foo:
> prepare a array and store the args of foo to the array
> call fentry_bpf1
> call fentry_bpf2
> ......
> call foo+4 (origin call)
> save the return value of foo
> call fexit_bpf1 (this bpf can get the return value of foo)
> call fexit_bpf2
> .......
> return to the caller of foo
>
>We can see that the trampoline_foo can be only used for
>the function foo, as different kernel function can be attached
>different BPF programs, and have different argument count,
>etc. Therefore, we have to create 1000 BPF trampolines if
>we want to attach a BPF program to 1000 kernel functions.
>
>The creation of the BPF trampoline is expensive. According to
>my testing, It will spend more than 1 second to create 100 bpf
>trampoline. What's more, it consumes more memory.
>
>If we have the per-function metadata supporting, then we can
>create a global BPF trampoline, like this:
>
>trampoline_global:
> prepare a array and store the args of foo to the array
> get the metadata by the ip
> call metadata.fentry_bpf1
> call metadata.fentry_bpf2
> ....
> call foo+4 (origin call)
> save the return value of foo
> call metadata.fexit_bpf1 (this bpf can get the return value of foo)
> call metadata.fexit_bpf2
> .......
> return to the caller of foo
>
>(The metadata holds more information for the global trampoline than
>I described.)
>
>Then, we don't need to create a trampoline for every kernel function
>anymore.
>
>Another beneficiary can be ftrace. For now, all the kernel functions that
>are enabled by dynamic ftrace will be added to a filter hash if there are
>more than one callbacks. And hash lookup will happen when the traced
>functions are called, which has an impact on the performance, see
>__ftrace_ops_list_func() -> ftrace_ops_test(). With the per-function
>metadata supporting, we can store the information that if the callback is
>enabled on the kernel function to the metadata, which can make the performance
>much better.
>
>The per-function metadata storage is a basic function, and I think there
>may be other functions that can use it for better performance in the feature
>too.
>
>(Hope that I'm describing it clearly :/)
>
>Thanks!
>Menglong Dong
>
This is way too cursory. For one thing, you need to start by explaining why you are asking to put this *inline* with the code, which is something that normally would be avoided at all cost.
Powered by blists - more mailing lists