[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQJYLSp0X-LiPftaDvU+SnJL84sgGM0M-=uQgq4g8=T=zg@mail.gmail.com>
Date: Mon, 14 Jul 2025 19:49:18 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Menglong Dong <menglong8.dong@...il.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>,
Menglong Dong <dongml2@...natelecom.cn>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi
On Mon, Jul 14, 2025 at 7:38 PM Menglong Dong <menglong8.dong@...il.com> wrote:
>
> On Tue, Jul 15, 2025 at 9:55 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Thu, Jul 3, 2025 at 5:17 AM Menglong Dong <menglong8.dong@...il.com> wrote:
> > >
> > > We don't use rhashtable here, as the compiler is not clever enough and it
> > > refused to inline the hash lookup for me, which bring in addition overhead
> > > in the following BPF global trampoline.
> >
> > That's not good enough justification.
> > rhashtable is used in many performance critical components.
> > You need to figure out what was causing compiler not to inline lookup
> > in your case.
> > Did you make sure that params are constant as I suggested earlier?
> > If 'static inline' wasn't enough, have you tried always_inline ?
>
> Yeah, I'm sure all the params are constant. always_inline works, but I have
> to replace the "inline" with "__always_inline" for rhashtable_lookup_fast,
> rhashtable_lookup, __rhashtable_lookup, rht_key_get_hash, etc. After that,
> everything will be inlined.
That doesn't sound right.
When everything is always_inline the compiler can inline the callback hashfn.
Without always inline do use see ht->p.hashfn in the assembly?
If so, the compiler is taking this path:
if (!__builtin_constant_p(params.key_len))
hash = ht->p.hashfn(key, ht->key_len, hash_rnd);
which is back to const params.
> In fact, I think rhashtable is not good enough in our case, which
> has high performance requirements. With rhashtable, the insn count
> is 35 to finish the hash lookup. With the hash table here, it needs only
> 17 insn, which means the rhashtable introduces ~5% overhead.
I feel you're not using rhashtable correctly.
Try disasm of xdp_unreg_mem_model().
The inlined lookup is quite small.
> We need to protect the md the same as how we protect the trampoline image,
> as it is used in the global trampoline from the beginning to the ending.
> The rcu_tasks, rcu_task_trace, percpu_ref is used for that purpose. It's
> complex, but it is really the same as what we do in bpf_tramp_image_put().
> You wrote that part, and I think you understand me :/
Sounds like you copied it without understanding :(
bpf trampoline is dynamic. It can go away and all the complexity
because of that. global trampoline is global.
It never gets freed. It doesn't need any of bpf trampoline complexity.
Powered by blists - more mailing lists