linux-kernel - Re: [PATCH v5 28/34] fprobe: Rewrite fprobe on function-graph tracer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231220100047.e33d862cb869423c2a3a82bf@kernel.org>
Date: Wed, 20 Dec 2023 10:00:47 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Jiri Olsa <olsajiri@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Steven Rostedt
 <rostedt@...dmis.org>, Florent Revest <revest@...omium.org>,
 linux-trace-kernel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, bpf <bpf@...r.kernel.org>, Sven
 Schnelle <svens@...ux.ibm.com>, Alexei Starovoitov <ast@...nel.org>,
 Arnaldo Carvalho de Melo <acme@...nel.org>, Daniel Borkmann
 <daniel@...earbox.net>, Alan Maguire <alan.maguire@...cle.com>, Mark
 Rutland <mark.rutland@....com>, Peter Zijlstra <peterz@...radead.org>,
 Thomas Gleixner <tglx@...utronix.de>, Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH v5 28/34] fprobe: Rewrite fprobe on function-graph
 tracer

On Tue, 19 Dec 2023 15:39:03 +0100
Jiri Olsa <olsajiri@...il.com> wrote:

> On Mon, Dec 18, 2023 at 10:17:10PM +0900, Masami Hiramatsu (Google) wrote:
> 
> SNIP
> 
> > -static void fprobe_exit_handler(struct rethook_node *rh, void *data,
> > -				unsigned long ret_ip, struct pt_regs *regs)
> > +static int fprobe_entry(unsigned long func, unsigned long ret_ip,
> > +			struct ftrace_regs *fregs, struct fgraph_ops *gops)
> >  {
> > -	struct fprobe *fp = (struct fprobe *)data;
> > -	struct fprobe_rethook_node *fpr;
> > -	struct ftrace_regs *fregs = (struct ftrace_regs *)regs;
> > -	int bit;
> > +	struct fprobe_hlist_node *node, *first;
> > +	unsigned long *fgraph_data = NULL;
> > +	unsigned long header;
> > +	int reserved_words;
> > +	struct fprobe *fp;
> > +	int used, ret;
> >  
> > -	if (!fp || fprobe_disabled(fp))
> > -		return;
> > +	if (WARN_ON_ONCE(!fregs))
> > +		return 0;
> >  
> > -	fpr = container_of(rh, struct fprobe_rethook_node, node);
> > +	first = node = find_first_fprobe_node(func);
> > +	if (unlikely(!first))
> > +		return 0;
> > +
> > +	reserved_words = 0;
> > +	hlist_for_each_entry_from_rcu(node, hlist) {
> > +		if (node->addr != func)
> > +			break;
> > +		fp = READ_ONCE(node->fp);
> > +		if (!fp || !fp->exit_handler)
> > +			continue;
> > +		/*
> > +		 * Since fprobe can be enabled until the next loop, we ignore the
> > +		 * fprobe's disabled flag in this loop.
> > +		 */
> > +		reserved_words +=
> > +			DIV_ROUND_UP(fp->entry_data_size, sizeof(long)) + 1;
> > +	}
> > +	node = first;
> > +	if (reserved_words) {
> > +		fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long));
> > +		if (unlikely(!fgraph_data)) {
> > +			hlist_for_each_entry_from_rcu(node, hlist) {
> > +				if (node->addr != func)
> > +					break;
> > +				fp = READ_ONCE(node->fp);
> > +				if (fp && !fprobe_disabled(fp))
> > +					fp->nmissed++;
> > +			}
> > +			return 0;
> > +		}
> > +	}
> 
> this looks expensive compared to what we do now.. IIUC due to the graph
> ops limitations (16 ctive ops), you have just single graph ops for fprobe
> and each fprobe registration stores ips into hash which you need to search
> in here to get registered callbacks..?

I think this is not so expensive. Most cases, it only hits 1 fprobe on the
hash. And if the fprobe is only used to hook the entry, reserved_words == 0.

> I wonder would it make sense to allow arbitrary number of active graph_ops
> with the price some callback might fail because there's no stack space so
> each fprobe instance would have its own graph_ops.. and we would get rid
> of the code above (and below) ?

Yeah, actually my first implementation is that. But I realized that doesn't
work, this requires intermediate object which has refcounter because the
"marker" on the shadow stack will be left after unregistering it. We need to
identify which is still available and which is not. And for that purpose,
we may need to introduce similar structure in the fgraph too.

The current multi-fgraph does;

 - if CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n (called from dedicated mcount
   asm code), it has to loop on all fgraph_ops and check the hash, which is
   inefficient but it can easily push the return trace entry on the shadow
   stack.

 - if CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y (called from ftrace asm code),
   it does not need to loop (that will be done by ftrace) but each handler
   does NOT know who pushed the return trace entry on the shadow stack.
   Thus it has to decode the shadow stack and check it needs to push return
   trace entry or not. And this is hard if the traced function is self-
   recursive call or tail call. To check the recursive call, I introduced
   a bitmap entry on the shadow stack. This bitmap size limits the max
   number of fgraph.

So, unlimit the number of fgraph, we may need to stack the number of fgraph
on the stack and each fgraph callback has to unwind the shadow stack to check
whether their own number is there instead of checking the bit in the bitmap.
That will be more trusted way but maybe slow.

Another option is introducing a pair of pre- and post-callbacks which is called
before and after calling the list/direct call of ftrace_ops. And pre-callback
will push the ret_stack on shadow stack and post-callback will commit or cancel it.
(but this one is hard to design... maybe becomes ugly interface.)

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>