[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+7NhegoZGHkiRyNO8ywks3ssPzQd6ipQzumZsWUHJALg@mail.gmail.com>
Date: Tue, 15 Jul 2025 09:35:07 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Menglong Dong <menglong.dong@...ux.dev>
Cc: Menglong Dong <menglong8.dong@...il.com>, Steven Rostedt <rostedt@...dmis.org>,
Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>,
Menglong Dong <dongml2@...natelecom.cn>, "H. Peter Anvin" <hpa@...or.com>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for
global trampoline
On Tue, Jul 15, 2025 at 1:37 AM Menglong Dong <menglong.dong@...ux.dev> wrote:
>
>
> On 7/15/25 10:25, Alexei Starovoitov wrote:
> > On Thu, Jul 3, 2025 at 5:17 AM Menglong Dong <menglong8.dong@...il.com> wrote:
> >> +static __always_inline void
> >> +do_origin_call(unsigned long *args, unsigned long *ip, int nr_args)
> >> +{
> >> + /* Following code will be optimized by the compiler, as nr_args
> >> + * is a const, and there will be no condition here.
> >> + */
> >> + if (nr_args == 0) {
> >> + asm volatile(
> >> + RESTORE_ORIGIN_0 CALL_NOSPEC "\n"
> >> + "movq %%rax, %0\n"
> >> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> >> + : [args]"r"(args), [thunk_target]"r"(*ip)
> >> + :
> >> + );
> >> + } else if (nr_args == 1) {
> >> + asm volatile(
> >> + RESTORE_ORIGIN_1 CALL_NOSPEC "\n"
> >> + "movq %%rax, %0\n"
> >> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> >> + : [args]"r"(args), [thunk_target]"r"(*ip)
> >> + : "rdi"
> >> + );
> >> + } else if (nr_args == 2) {
> >> + asm volatile(
> >> + RESTORE_ORIGIN_2 CALL_NOSPEC "\n"
> >> + "movq %%rax, %0\n"
> >> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> >> + : [args]"r"(args), [thunk_target]"r"(*ip)
> >> + : "rdi", "rsi"
> >> + );
> >> + } else if (nr_args == 3) {
> >> + asm volatile(
> >> + RESTORE_ORIGIN_3 CALL_NOSPEC "\n"
> >> + "movq %%rax, %0\n"
> >> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> >> + : [args]"r"(args), [thunk_target]"r"(*ip)
> >> + : "rdi", "rsi", "rdx"
> >> + );
> >> + } else if (nr_args == 4) {
> >> + asm volatile(
> >> + RESTORE_ORIGIN_4 CALL_NOSPEC "\n"
> >> + "movq %%rax, %0\n"
> >> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> >> + : [args]"r"(args), [thunk_target]"r"(*ip)
> >> + : "rdi", "rsi", "rdx", "rcx"
> >> + );
> >> + } else if (nr_args == 5) {
> >> + asm volatile(
> >> + RESTORE_ORIGIN_5 CALL_NOSPEC "\n"
> >> + "movq %%rax, %0\n"
> >> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> >> + : [args]"r"(args), [thunk_target]"r"(*ip)
> >> + : "rdi", "rsi", "rdx", "rcx", "r8"
> >> + );
> >> + } else if (nr_args == 6) {
> >> + asm volatile(
> >> + RESTORE_ORIGIN_6 CALL_NOSPEC "\n"
> >> + "movq %%rax, %0\n"
> >> + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT
> >> + : [args]"r"(args), [thunk_target]"r"(*ip)
> >> + : "rdi", "rsi", "rdx", "rcx", "r8", "r9"
> >> + );
> >> + }
> >> +}
> > What is the performance difference between 0-6 variants?
> > I would think save/restore of regs shouldn't be that expensive.
> > bpf trampoline saves only what's necessary because it can do
> > this micro optimization, but for this one, I think, doing
> > _one_ global trampoline that covers all cases will simplify the code
> > a lot, but please benchmark the difference to understand
> > the trade-off.
>
> According to my benchmark, it has ~5% overhead to save/restore
> *5* variants when compared with *0* variant. The save/restore of regs
> is fast, but it still need 12 insn, which can produce ~6% overhead.
I think it's an ok trade off, because with one global trampoline
we do not need to call rhashtable lookup before entering bpf prog.
bpf prog will do it on demand if/when it needs to access arguments.
This will compensate for a bit of lost performance due to extra save/restore.
PS
pls don't add your chinatelecom.cn email in cc.
gmail just cannot deliver there and it's annoying to keep deleting
it manually in every reply.
Powered by blists - more mailing lists