[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3ZYCYgFokxoq0d5jEJ8V73KsJmYQnHtxWc3RO_8X5zC8Q@mail.gmail.com>
Date: Tue, 15 Jul 2025 17:06:04 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>,
Menglong Dong <dongml2@...natelecom.cn>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2 01/18] bpf: add function hash table for tracing-multi
On Tue, Jul 15, 2025 at 11:13 AM Menglong Dong <menglong8.dong@...il.com> wrote:
>
> On Tue, Jul 15, 2025 at 10:49 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Mon, Jul 14, 2025 at 7:38 PM Menglong Dong <menglong8.dong@...il.com> wrote:
[......]
> >
> > That doesn't sound right.
> > When everything is always_inline the compiler can inline the callback hashfn.
> > Without always inline do use see ht->p.hashfn in the assembly?
> > If so, the compiler is taking this path:
> > if (!__builtin_constant_p(params.key_len))
> > hash = ht->p.hashfn(key, ht->key_len, hash_rnd);
> >
> > which is back to const params.
>
> I think the compiler thinks the bpf_global_caller is complex enough and
> refuses to inline it for me, and a call to __rhashtable_lookup() happens.
> When I add always_inline to __rhashtable_lookup(), the compiler makes
> a call to rht_key_get_hash(), which is annoying. And I'm sure the params.key_len
> is const, and the function call is not for the ht->p.hashfn.
>
> >
> > > In fact, I think rhashtable is not good enough in our case, which
> > > has high performance requirements. With rhashtable, the insn count
> > > is 35 to finish the hash lookup. With the hash table here, it needs only
> > > 17 insn, which means the rhashtable introduces ~5% overhead.
> >
> > I feel you're not using rhashtable correctly.
> > Try disasm of xdp_unreg_mem_model().
> > The inlined lookup is quite small.
>
> Okay, I'll disasm it and have a look. In my case, it does consume 35 insn
> after I disasm it.
You might not believe it when I say this, the rhashtable lookup in my
kernel is not inlined in xdp_unreg_mem_model(), and following is the
disasm:
disassemble xdp_unreg_mem_model
Dump of assembler code for function xdp_unreg_mem_model:
0xffffffff81e68760 <+0>: call 0xffffffff8127f9d0 <__fentry__>
0xffffffff81e68765 <+5>: push %rbx
0xffffffff81e68766 <+6>: sub $0x10,%rsp
[......]
/* we can see that the function call to __rhashtable_lookup happens
in this line. */
0xffffffff81e687ba <+90>: call 0xffffffff81e686c0 <__rhashtable_lookup>
0xffffffff81e687bf <+95>: test %rax,%rax
0xffffffff81e687c2 <+98>: je 0xffffffff81e687cb
<xdp_unreg_mem_model+107>
[......]
The gcc that I'm using is:
gcc --version
gcc (Debian 12.2.0-14+deb12u1) 12.2.0
I think there may be something wrong with the rhashtable, which needs some
fixing?
Powered by blists - more mailing lists