[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231220100047.e33d862cb869423c2a3a82bf@kernel.org>
Date: Wed, 20 Dec 2023 10:00:47 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Jiri Olsa <olsajiri@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Steven Rostedt
<rostedt@...dmis.org>, Florent Revest <revest@...omium.org>,
linux-trace-kernel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, bpf <bpf@...r.kernel.org>, Sven
Schnelle <svens@...ux.ibm.com>, Alexei Starovoitov <ast@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Daniel Borkmann
<daniel@...earbox.net>, Alan Maguire <alan.maguire@...cle.com>, Mark
Rutland <mark.rutland@....com>, Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH v5 28/34] fprobe: Rewrite fprobe on function-graph
tracer
On Tue, 19 Dec 2023 15:39:03 +0100
Jiri Olsa <olsajiri@...il.com> wrote:
> On Mon, Dec 18, 2023 at 10:17:10PM +0900, Masami Hiramatsu (Google) wrote:
>
> SNIP
>
> > -static void fprobe_exit_handler(struct rethook_node *rh, void *data,
> > - unsigned long ret_ip, struct pt_regs *regs)
> > +static int fprobe_entry(unsigned long func, unsigned long ret_ip,
> > + struct ftrace_regs *fregs, struct fgraph_ops *gops)
> > {
> > - struct fprobe *fp = (struct fprobe *)data;
> > - struct fprobe_rethook_node *fpr;
> > - struct ftrace_regs *fregs = (struct ftrace_regs *)regs;
> > - int bit;
> > + struct fprobe_hlist_node *node, *first;
> > + unsigned long *fgraph_data = NULL;
> > + unsigned long header;
> > + int reserved_words;
> > + struct fprobe *fp;
> > + int used, ret;
> >
> > - if (!fp || fprobe_disabled(fp))
> > - return;
> > + if (WARN_ON_ONCE(!fregs))
> > + return 0;
> >
> > - fpr = container_of(rh, struct fprobe_rethook_node, node);
> > + first = node = find_first_fprobe_node(func);
> > + if (unlikely(!first))
> > + return 0;
> > +
> > + reserved_words = 0;
> > + hlist_for_each_entry_from_rcu(node, hlist) {
> > + if (node->addr != func)
> > + break;
> > + fp = READ_ONCE(node->fp);
> > + if (!fp || !fp->exit_handler)
> > + continue;
> > + /*
> > + * Since fprobe can be enabled until the next loop, we ignore the
> > + * fprobe's disabled flag in this loop.
> > + */
> > + reserved_words +=
> > + DIV_ROUND_UP(fp->entry_data_size, sizeof(long)) + 1;
> > + }
> > + node = first;
> > + if (reserved_words) {
> > + fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long));
> > + if (unlikely(!fgraph_data)) {
> > + hlist_for_each_entry_from_rcu(node, hlist) {
> > + if (node->addr != func)
> > + break;
> > + fp = READ_ONCE(node->fp);
> > + if (fp && !fprobe_disabled(fp))
> > + fp->nmissed++;
> > + }
> > + return 0;
> > + }
> > + }
>
> this looks expensive compared to what we do now.. IIUC due to the graph
> ops limitations (16 ctive ops), you have just single graph ops for fprobe
> and each fprobe registration stores ips into hash which you need to search
> in here to get registered callbacks..?
I think this is not so expensive. Most cases, it only hits 1 fprobe on the
hash. And if the fprobe is only used to hook the entry, reserved_words == 0.
> I wonder would it make sense to allow arbitrary number of active graph_ops
> with the price some callback might fail because there's no stack space so
> each fprobe instance would have its own graph_ops.. and we would get rid
> of the code above (and below) ?
Yeah, actually my first implementation is that. But I realized that doesn't
work, this requires intermediate object which has refcounter because the
"marker" on the shadow stack will be left after unregistering it. We need to
identify which is still available and which is not. And for that purpose,
we may need to introduce similar structure in the fgraph too.
The current multi-fgraph does;
- if CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n (called from dedicated mcount
asm code), it has to loop on all fgraph_ops and check the hash, which is
inefficient but it can easily push the return trace entry on the shadow
stack.
- if CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y (called from ftrace asm code),
it does not need to loop (that will be done by ftrace) but each handler
does NOT know who pushed the return trace entry on the shadow stack.
Thus it has to decode the shadow stack and check it needs to push return
trace entry or not. And this is hard if the traced function is self-
recursive call or tail call. To check the recursive call, I introduced
a bitmap entry on the shadow stack. This bitmap size limits the max
number of fgraph.
So, unlimit the number of fgraph, we may need to stack the number of fgraph
on the stack and each fgraph callback has to unwind the shadow stack to check
whether their own number is there instead of checking the bit in the bitmap.
That will be more trusted way but maybe slow.
Another option is introducing a pair of pre- and post-callbacks which is called
before and after calling the list/direct call of ftrace_ops. And pre-callback
will push the ret_stack on shadow stack and post-callback will commit or cancel it.
(but this one is hard to design... maybe becomes ugly interface.)
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists