[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALz3k9idLX10+Gh18xWepwtgvp4VZ3zQfY4aoNXn0gCh8Fs_fA@mail.gmail.com>
Date: Sat, 30 Mar 2024 11:18:29 +0800
From: 梦龙董 <dongmenglong.8@...edance.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Jiri Olsa <jolsa@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>,
Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, "David S. Miller" <davem@...emloft.net>,
David Ahern <dsahern@...nel.org>, Dave Hansen <dave.hansen@...ux.intel.com>,
X86 ML <x86@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Quentin Monnet <quentin@...valent.com>, bpf <bpf@...r.kernel.org>,
linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, LKML <linux-kernel@...r.kernel.org>,
linux-riscv <linux-riscv@...ts.infradead.org>, linux-s390 <linux-s390@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>, linux-trace-kernel@...r.kernel.org,
"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>, linux-stm32@...md-mailman.stormreply.com
Subject: Re: [External] Re: [PATCH bpf-next v2 1/9] bpf: tracing: add support
to record and check the accessed args
On Thu, Mar 28, 2024 at 11:11 PM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Thu, 28 Mar 2024 22:43:46 +0800
> 梦龙董 <dongmenglong.8@...edance.com> wrote:
>
> > I have done a simple benchmark on creating 1000
> > trampolines. It is slow, quite slow, which consume up to
> > 60s. We can't do it this way.
> >
> > Now, I have a bad idea. How about we introduce
> > a "dynamic trampoline"? The basic logic of it can be:
> >
> > """
> > save regs
> > bpfs = trampoline_lookup_ip(ip)
> > fentry = bpfs->fentries
> > while fentry:
> > fentry(ctx)
> > fentry = fentry->next
> >
> > call origin
> > save return value
> >
> > fexit = bpfs->fexits
> > while fexit:
> > fexit(ctx)
> > fexit = fexit->next
> >
> > xxxxxx
> > """
> >
> > And we lookup the "bpfs" by the function ip in a hash map
> > in trampoline_lookup_ip. The type of "bpfs" is:
> >
> > struct bpf_array {
> > struct bpf_prog *fentries;
> > struct bpf_prog *fexits;
> > struct bpf_prog *modify_returns;
> > }
> >
> > When we need to attach the bpf progA to function A/B/C,
> > we only need to create the bpf_arrayA, bpf_arrayB, bpf_arrayC
> > and add the progA to them, and insert them to the hash map
> > "direct_call_bpfs", and attach the "dynamic trampoline" to
> > A/B/C. If bpf_arrayA exist, just add progA to the tail of
> > bpf_arrayA->fentries. When we need to attach progB to
> > B/C, just add progB to bpf_arrayB->fentries and
> > bpf_arrayB->fentries.
> >
> > Compared to the trampoline, extra overhead is introduced
> > by the hash lookuping.
> >
> > I have not begun to code yet, and I am not sure the overhead is
> > acceptable. Considering that we also need to do hash lookup
> > by the function in kprobe_multi, maybe the overhead is
> > acceptable?
>
> Sounds like you are just recreating the function management that ftrace
> has. It also can add thousands of trampolines very quickly, because it does
> it in batches. It takes special synchronization steps to attach to fentry.
> ftrace (and I believe multi-kprobes) updates all the attachments for each
> step, so the synchronization needed is only done once.
>
Yes, it is fast to register a trampoline for a kernel function
in the managed ftrace in
register_fentry->register_ftrace_direct->ftrace_add_rec_direct.
And it will add the trampoline to the hash table "direct_functions".
And the trampoline will be called in the following
step (I'm not sure if I understand it correctly):
ftrace_regs_caller
|
__ftrace_ops_list_func -> call_direct_funcs -> save trampoline to
pt_regs->origin_ax
|
call pt_regs->origin_ax if not NULL
The logic above means that we can only call a
trampoline once, and a kernel function can only have
one trampoline.
The original idea of mine is to register all the shared
trampoline to the managed ftrace. For example, if we have
the shared trampoline1 for function A/B/C, and shared
trampoline2 for function B/C/D, then I register trampoline1
and trampoline2 for function B/C. However, it can't work,
as we can't call 2 trampolines for a function.
Then, I thought that we could create a "dynamic trampoline".
The logic for the non-ftrace-managed case is simple, we
only need to replace the "nop" of all the target functions
to "call dynamic_trampoline". And for the ftrace-managed
case, the logic is the same too, except that the trampoline
that we add to the "direct_functions" hash is the
dynamic-trampoline:
ftrace_regs_caller
|
__ftrace_ops_list_func -> call_direct_funcs -> save dynamic-trampoline
to pt_regs->origin_ax
|
call pt_regs->origin_ax(dynamic-trampoline) if not NULL
And in the dynamic-trampoline, we can call prog1 for
A, call prog1 and prog2 for B/C, call prog2 for D.
And the register is fast enough.
> If you really want to have thousands of functions, why not just register it
> with ftrace itself. It will give you the arguments via the ftrace_regs
> structure. Can't you just register a program as the callback?
>
Ennn...I don't understand. The main purpose for
me to use TRACING is:
1. we can directly access the memory, which is more
efficient.
2. we can obtain the function args in FEXIT, which
kretprobe can't do it. And this is the main reason.
Thanks!
Menglong Dong
> It will probably make your accounting much easier, and just let ftrace
> handle the fentry logic. That's what it was made to do.
>
> -- Steve
Powered by blists - more mailing lists