lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQKP1-gdmq1xkogFeRM6o3j2zf0Q8Atz=aCEkB0PkVx++A@mail.gmail.com>
Date: Mon, 14 Jul 2025 19:25:22 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Menglong Dong <menglong8.dong@...il.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>, 
	Menglong Dong <dongml2@...natelecom.cn>, "H. Peter Anvin" <hpa@...or.com>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, 
	LKML <linux-kernel@...r.kernel.org>, Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for
 global trampoline

On Thu, Jul 3, 2025 at 5:17 AM Menglong Dong <menglong8.dong@...il.com> wrote:
>
> +static __always_inline void
> +do_origin_call(unsigned long *args, unsigned long *ip, int nr_args)
> +{
> +       /* Following code will be optimized by the compiler, as nr_args
> +        * is a const, and there will be no condition here.
> +        */
> +       if (nr_args == 0) {
> +               asm volatile(
> +                       RESTORE_ORIGIN_0 CALL_NOSPEC "\n"
> +                       "movq %%rax, %0\n"
> +                       : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> +                       : [args]"r"(args), [thunk_target]"r"(*ip)
> +                       :
> +               );
> +       } else if (nr_args == 1) {
> +               asm volatile(
> +                       RESTORE_ORIGIN_1 CALL_NOSPEC "\n"
> +                       "movq %%rax, %0\n"
> +                       : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> +                       : [args]"r"(args), [thunk_target]"r"(*ip)
> +                       : "rdi"
> +               );
> +       } else if (nr_args == 2) {
> +               asm volatile(
> +                       RESTORE_ORIGIN_2 CALL_NOSPEC "\n"
> +                       "movq %%rax, %0\n"
> +                       : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> +                       : [args]"r"(args), [thunk_target]"r"(*ip)
> +                       : "rdi", "rsi"
> +               );
> +       } else if (nr_args == 3) {
> +               asm volatile(
> +                       RESTORE_ORIGIN_3 CALL_NOSPEC "\n"
> +                       "movq %%rax, %0\n"
> +                       : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> +                       : [args]"r"(args), [thunk_target]"r"(*ip)
> +                       : "rdi", "rsi", "rdx"
> +               );
> +       } else if (nr_args == 4) {
> +               asm volatile(
> +                       RESTORE_ORIGIN_4 CALL_NOSPEC "\n"
> +                       "movq %%rax, %0\n"
> +                       : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> +                       : [args]"r"(args), [thunk_target]"r"(*ip)
> +                       : "rdi", "rsi", "rdx", "rcx"
> +               );
> +       } else if (nr_args == 5) {
> +               asm volatile(
> +                       RESTORE_ORIGIN_5 CALL_NOSPEC "\n"
> +                       "movq %%rax, %0\n"
> +                       : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> +                       : [args]"r"(args), [thunk_target]"r"(*ip)
> +                       : "rdi", "rsi", "rdx", "rcx", "r8"
> +               );
> +       } else if (nr_args == 6) {
> +               asm volatile(
> +                       RESTORE_ORIGIN_6 CALL_NOSPEC "\n"
> +                       "movq %%rax, %0\n"
> +                       : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> +                       : [args]"r"(args), [thunk_target]"r"(*ip)
> +                       : "rdi", "rsi", "rdx", "rcx", "r8", "r9"
> +               );
> +       }
> +}

What is the performance difference between 0-6 variants?
I would think save/restore of regs shouldn't be that expensive.
bpf trampoline saves only what's necessary because it can do
this micro optimization, but for this one, I think, doing
_one_ global trampoline that covers all cases will simplify the code
a lot, but please benchmark the difference to understand
the trade-off.

The major simplification will be due to skipping nr_args.
There won't be a need to do btf model and count the args.
Just do one trampoline for them all.

Also funcs with 7+ arguments need to be thought through
from the start.
I think it's ok trade-off if we allow global trampoline
to be safe to attach to a function with 7+ args (and
it will not mess with the stack), but bpf prog can only
access up to 6 args. The kfuncs to access arg 7 might be
more complex and slower. It's ok trade off.

> +
> +static __always_inline notrace void
> +run_tramp_prog(struct kfunc_md_tramp_prog *tramp_prog,
> +              struct bpf_tramp_run_ctx *run_ctx, unsigned long *args)
> +{
> +       struct bpf_prog *prog;
> +       u64 start_time;
> +
> +       while (tramp_prog) {
> +               prog = tramp_prog->prog;
> +               run_ctx->bpf_cookie = tramp_prog->cookie;
> +               start_time = bpf_gtramp_enter(prog, run_ctx);
> +
> +               if (likely(start_time)) {
> +                       asm volatile(
> +                               CALL_NOSPEC "\n"
> +                               : : [thunk_target]"r"(prog->bpf_func), [args]"D"(args)
> +                       );

Why this cannot be "call *(prog->bpf_func)" ?

> +               }
> +
> +               bpf_gtramp_exit(prog, start_time, run_ctx);
> +               tramp_prog = tramp_prog->next;
> +       }
> +}
> +
> +static __always_inline notrace int
> +bpf_global_caller_run(unsigned long *args, unsigned long *ip, int nr_args)

Pls share top 10 from "perf report" while running the bench.
I'm curious about what's hot.
Last time I benchmarked fentry/fexit migrate_disable/enable were
one the hottest functions. I suspect it's the case here as well.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ