lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADxym3Yh630_NqGsvg3=w_D=K1WdvJ1v=7gvEYg4k0x_jAKAUA@mail.gmail.com>
Date: Fri, 20 Jun 2025 22:38:55 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Eduard <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Menglong Dong <dongml2@...natelecom.cn>
Subject: Re: [PATCH bpf-next] bpf: make update_prog_stats always_inline

On Thu, Jun 19, 2025 at 2:07 AM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Wed, Jun 18, 2025 at 1:58 AM Menglong Dong <menglong8.dong@...il.com> wrote:
> >
> > The function update_prog_stats() will be called in the bpf trampoline.
> > Make it always_inline to reduce the overhead.
>
> What kind of difference did you measure ?

Hi, Alexei. I think I made some things wrong. The update_prog_stats is already
optimized by the compiler for the current code by inline it. I observed the CPU
consumption of update_prog_stats() by perf in my global trampoline, but it
doesn't exist in bpf trampoline.

Anyway, I think it is worth to make it inline manually here, as we can't rely on
the compiler all the time. When I add noinline to update_prog_stats(), the
performance of bench/trig-fentry decrease from 120M/s to 113M/s.

>
> > Signed-off-by: Menglong Dong <dongml2@...natelecom.cn>
> > ---
> >  kernel/bpf/trampoline.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > index c4b1a98ff726..134bcfd00b15 100644
> > --- a/kernel/bpf/trampoline.c
> > +++ b/kernel/bpf/trampoline.c
> > @@ -911,8 +911,8 @@ static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tram
> >         return bpf_prog_start_time();
> >  }
> >
> > -static void notrace update_prog_stats(struct bpf_prog *prog,
> > -                                     u64 start)
> > +static __always_inline void notrace update_prog_stats(struct bpf_prog *prog,
> > +                                                     u64 start)
> >  {
>
> How about the following instead:
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index c4b1a98ff726..728bb2845f41 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -911,28 +911,23 @@ static u64 notrace __bpf_prog_enter_recur(struct
> bpf_prog *prog, struct bpf_tram
>      return bpf_prog_start_time();
>  }
>
> -static void notrace update_prog_stats(struct bpf_prog *prog,
> -                      u64 start)
> +static noinline void notrace __update_prog_stats(struct bpf_prog *prog,

Does "noinline" have any special meaning here? It seems that
we don't need it :/

> +                         u64 start)
>  {
>      struct bpf_prog_stats *stats;
> -
> -    if (static_branch_unlikely(&bpf_stats_enabled_key) &&
> -        /* static_key could be enabled in __bpf_prog_enter*
> -         * and disabled in __bpf_prog_exit*.
> -         * And vice versa.
> -         * Hence check that 'start' is valid.
> -         */
> -        start > NO_START_TIME) {
> -        u64 duration = sched_clock() - start;
> -        unsigned long flags;
> -
> -        stats = this_cpu_ptr(prog->stats);
> -        flags = u64_stats_update_begin_irqsave(&stats->syncp);
> -        u64_stats_inc(&stats->cnt);
> -        u64_stats_add(&stats->nsecs, duration);
> -        u64_stats_update_end_irqrestore(&stats->syncp, flags);
> -    }
> +    u64 duration = sched_clock() - start;
> +    unsigned long flags;
> +
> +    stats = this_cpu_ptr(prog->stats);
> +    flags = u64_stats_update_begin_irqsave(&stats->syncp);
> +    u64_stats_inc(&stats->cnt);
> +    u64_stats_add(&stats->nsecs, duration);
> +    u64_stats_update_end_irqrestore(&stats->syncp, flags);
>  }
> +#define update_prog_stats(prog, start) \
> +    if (static_branch_unlikely(&bpf_stats_enabled_key) && \
> +        start > NO_START_TIME) \
> +        __update_prog_stats(prog, start)
>
>  static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start,
>                        struct bpf_tramp_run_ctx *run_ctx)
>
>
> Maybe
> if (start > NO_START_TIME)
> should stay within __update_prog_stats().
>
> pls run a few experiments.

Yeah, I think it is much better in this way. And the performance stay exactly
the same after the midifition, which means it has the same effect as
the compiler's automatical optimization.

Thanks!
Menglong Dong

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ