[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3ZaiGYJWd-ME98G_=7q0EZA-sU7G=x=j5kcnNgRJ0893A@mail.gmail.com>
Date: Tue, 15 Jul 2025 11:13:43 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>,
Menglong Dong <dongml2@...natelecom.cn>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi
On Tue, Jul 15, 2025 at 10:49 AM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Mon, Jul 14, 2025 at 7:38 PM Menglong Dong <menglong8.dong@...il.com> wrote:
> >
> > On Tue, Jul 15, 2025 at 9:55 AM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Thu, Jul 3, 2025 at 5:17 AM Menglong Dong <menglong8.dong@...il.com> wrote:
> > > >
> > > > We don't use rhashtable here, as the compiler is not clever enough and it
> > > > refused to inline the hash lookup for me, which bring in addition overhead
> > > > in the following BPF global trampoline.
> > >
> > > That's not good enough justification.
> > > rhashtable is used in many performance critical components.
> > > You need to figure out what was causing compiler not to inline lookup
> > > in your case.
> > > Did you make sure that params are constant as I suggested earlier?
> > > If 'static inline' wasn't enough, have you tried always_inline ?
> >
> > Yeah, I'm sure all the params are constant. always_inline works, but I have
> > to replace the "inline" with "__always_inline" for rhashtable_lookup_fast,
> > rhashtable_lookup, __rhashtable_lookup, rht_key_get_hash, etc. After that,
> > everything will be inlined.
>
> That doesn't sound right.
> When everything is always_inline the compiler can inline the callback hashfn.
> Without always inline do use see ht->p.hashfn in the assembly?
> If so, the compiler is taking this path:
> if (!__builtin_constant_p(params.key_len))
> hash = ht->p.hashfn(key, ht->key_len, hash_rnd);
>
> which is back to const params.
I think the compiler thinks the bpf_global_caller is complex enough and
refuses to inline it for me, and a call to __rhashtable_lookup() happens.
When I add always_inline to __rhashtable_lookup(), the compiler makes
a call to rht_key_get_hash(), which is annoying. And I'm sure the params.key_len
is const, and the function call is not for the ht->p.hashfn.
>
> > In fact, I think rhashtable is not good enough in our case, which
> > has high performance requirements. With rhashtable, the insn count
> > is 35 to finish the hash lookup. With the hash table here, it needs only
> > 17 insn, which means the rhashtable introduces ~5% overhead.
>
> I feel you're not using rhashtable correctly.
> Try disasm of xdp_unreg_mem_model().
> The inlined lookup is quite small.
Okay, I'll disasm it and have a look. In my case, it does consume 35 insn
after I disasm it.
>
> > We need to protect the md the same as how we protect the trampoline image,
> > as it is used in the global trampoline from the beginning to the ending.
> > The rcu_tasks, rcu_task_trace, percpu_ref is used for that purpose. It's
> > complex, but it is really the same as what we do in bpf_tramp_image_put().
> > You wrote that part, and I think you understand me :/
>
> Sounds like you copied it without understanding :(
> bpf trampoline is dynamic. It can go away and all the complexity
> because of that. global trampoline is global.
> It never gets freed. It doesn't need any of bpf trampoline complexity.
Hi, I think you misunderstand something. I'm not protecting the
global trampoline, and it doesn't need to be protected. I'm protecting
the kfunc_md.
In the global trampoline, we will look up the kfunc_md of the current ip
and use it. And it will be used from the beginning to the ending in the
global trampoline, so we need to protect it.
But we can see that the life of the kfunc_md is exactly the same as
the bpf trampoline image. So we can use the same way to protect, which
prevent it from being freed if it is still used in the global trampoline.
Powered by blists - more mailing lists