[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+Afov4E=9t=3M=zZmO9z4ZqT6imWD5xijDHshTf3J=RA@mail.gmail.com>
Date: Wed, 16 Jul 2025 09:56:11 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Menglong Dong <menglong.dong@...ux.dev>, Peter Zijlstra <peterz@...radead.org>
Cc: Menglong Dong <menglong8.dong@...il.com>, Steven Rostedt <rostedt@...dmis.org>,
Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>,
LKML <linux-kernel@...r.kernel.org>, Network Development <netdev@...r.kernel.org>
Subject: Inlining migrate_disable/enable. Was: [PATCH bpf-next v2 02/18]
x86,bpf: add bpf_global_caller for global trampoline
On Tue, Jul 15, 2025 at 2:31 AM Menglong Dong <menglong.dong@...ux.dev> wrote:
>
> Following are the test results for fentry-multi:
> 36.36% bpf_prog_2dcccf652aac1793_bench_trigger_fentry_multi [k]
> bpf_prog_2dcccf652aac1793_bench_trigger_fentry_multi
> 20.54% [kernel] [k] migrate_enable
> 19.35% [kernel] [k] bpf_global_caller_5_run
> 6.52% [kernel] [k] bpf_global_caller_5
> 3.58% libc.so.6 [.] syscall
> 2.88% [kernel] [k] entry_SYSCALL_64
> 1.50% [kernel] [k] memchr_inv
> 1.39% [kernel] [k] fput
> 1.04% [kernel] [k] migrate_disable
> 0.91% [kernel] [k] _copy_to_user
>
> And I also did the testing for fentry:
> 54.63% bpf_prog_2dcccf652aac1793_bench_trigger_fentry [k]
> bpf_prog_2dcccf652aac1793_bench_trigger_fentry
> 10.43% [kernel] [k] migrate_enable
> 10.07% bpf_trampoline_6442517037 [k] bpf_trampoline_6442517037
> 8.06% [kernel] [k] __bpf_prog_exit_recur
> 4.11% libc.so.6 [.] syscall
> 2.15% [kernel] [k] entry_SYSCALL_64
> 1.48% [kernel] [k] memchr_inv
> 1.32% [kernel] [k] fput
> 1.16% [kernel] [k] _copy_to_user
> 0.73% [kernel] [k] bpf_prog_test_run_raw_tp
Let's pause fentry-multi stuff and fix this as a higher priority.
Since migrate_disable/enable is so hot in yours and my tests,
let's figure out how to inline it.
As far as I can see both functions can be moved to a header file
including this_rq() macro, but we need to keep
struct rq private to sched.h. Moving the whole thing is not an option.
Luckily we only need nr_pinned from there.
Maybe we can offsetof(struct rq, nr_pinned) in a precompile step
the way it's done for asm-offsets ?
And then use that constant to do nr_pinned ++, --.
__set_cpus_allowed_ptr() is a slow path and can stay .c
Maybe Peter has better ideas ?
Powered by blists - more mailing lists