lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20210817205231.0b1d2c395c95196b2129ed5c@kernel.org>
Date:   Tue, 17 Aug 2021 20:52:31 +0900
From:   Masami Hiramatsu <mhiramat@...nel.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@...il.com>,
        linux-trace-devel@...r.kernel.org, linux-kernel@...r.kernel.org,
        tom.zanussi@...ux.intel.com
Subject: Re: [PATCH v4] [RFC] trace: Add kprobe on tracepoint

On Mon, 16 Aug 2021 17:40:18 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> On Fri, 13 Aug 2021 00:44:48 +0900
> Masami Hiramatsu <mhiramat@...nel.org> wrote:
> 
> Tzvetomir's on PTO so I'm helping out (trying to get this into the next
> merge window).
> 
> > Hi,
> > 
> > Here is my code review.
> > 
> > On Wed, 11 Aug 2021 17:14:33 +0300
> > "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@...il.com> wrote:
> > [...]
> > > +
> > > +static struct trace_eprobe *to_trace_eprobe(struct dyn_event *ev)
> > > +{
> > > +	return container_of(ev, struct trace_eprobe, devent);
> > > +}
> > > +
> > > +static int trace_eprobe_find(struct trace_eprobe *ep)  
> > 
> > This function name is a bit easy to mislead. If I were you,
> > I call it 'trace_eprobe_setup_event()'.
> 
> I agree the name is misleading.
> 
> > Or, I'll make it returns 'struct trace_event_call *' and set
> > ep->event outside of this function. Moreover, I recommend you
> > to call this before alloc_event_probe() and pass the result to it.
> > e.g.
> > 
> > ... /* parse the target system and event */
> > 
> > event_call = find_and_get_event(sys_name, sys_event);
> > if (!event_call)
> >    goto error;
> > 
> > ep = alloc_event_probe(group, event, sys_name, sys_event, event_call, argc - 2);
> > if (IS_ERR(ep))
> 
> To do the above, we'll need to take the event_mutex over both calls. I
> can try that, and see what the lockdep fallout is ;-)

OK, I got it. Then at least change the name please.
("trace_eprobe_setup_event"?)

> 
> 
> >    ...
> > 
> > > +{
> > > +	struct trace_event_call *tp_event;
> > > +	int ret = -ENOENT;
> > > +	const char *name;
> > > +
> > > +	mutex_lock(&event_mutex);
> > > +	list_for_each_entry(tp_event, &ftrace_events, list) {
> > > +		if (tp_event->flags & TRACE_EVENT_FL_IGNORE_ENABLE)
> > > +			continue;
> > > +		if (!tp_event->class->system ||
> > > +		    strcmp(ep->event_system, tp_event->class->system))
> > > +			continue;
> > > +		name = trace_event_name(tp_event);
> > > +		if (!name ||
> > > +		    strcmp(ep->event_name, name))
> > > +			continue;
> > > +		if (!try_module_get(tp_event->mod)) {
> > > +			ret = -ENODEV;
> > > +			break;
> > > +		}  
> > 
> > BTW, this can lock the static events (because the module in where the
> > event is locked), but can not lock the other dynamic events.
> > Maybe we need 2 more patches.
> > 
> > 1) introduce TRACE_EVENT_FL_PROBE and set the flag in alloc_trace_k/uprobe().
> >    (and eprobe will skip it)
> 
> I'm fine with what you suggest here.
> 
> 
> > 2) introduce refcount for the trace_event_call, which prevent removing
> >    synthetic event.
> 
> I rather not add another field to trace_event_call. There's one for
> every event, which we have over a thousand events. Every field we add
> to that structure adds more bloat to the kernel memory footprint.

Indeed.

> Perhaps we can have a dynamic event structure that encapsulates the
> trace_event_call used by kprobes, et.al. and add the counter there? I
> would require changes like:
> 
> static inline struct dymanic_event_call *
> dynamic_event_from_trace_event(struct trace_event_call *call)
> {
> 	WARN_ON_ONCE(!(call->flags & TRACE_EVENT_FL_DYN_EVENT));
> 	return container_of(call, struct dynamic_event_call, event);
> }
> 
> [..]
> 
> static inline struct trace_probe_event *
> trace_probe_event_from_call(struct trace_event_call *event_call)
> {
> 	struct dynamic_event_call *de = dynamic_event_from_trace_event(call);
> 	return container_of(event_call, struct trace_probe_event, de);
> }
> 
> Or maybe I can make the mod a union. As it's not needed for dynamic
> events, perhaps I can use the void *mod as:
> 
> struct trace_event_call {
> 	[..]
> 	union {
> 		void		*mod;
> 		atomic_t	dynamic_ref_count;
> 	};
> 
> ??
>

That's a good idea :)
 
> If we place a kprobe on module text, I'm guessing the kprobe itself
> handles that module logic, we don't need to do it on the trace event.
> 
> > 
> > 
> > > +		ep->event = tp_event;
> > > +		ret = 0;
> > > +		break;
> > > +	}
> > > +	mutex_unlock(&event_mutex);
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static int eprobe_dyn_event_create(const char *raw_command)
> > > +{
> > > +	return trace_probe_create(raw_command, __trace_eprobe_create);
> > > +}
> > > +  
> > [...]
> > > +
> > > +static struct trace_eprobe *alloc_event_probe(const char *group,
> > > +					      const char *event,
> > > +					      const char *sys_name,
> > > +					      const char *sys_event,
> > > +					      int nargs)
> > > +{
> > > +	struct trace_eprobe *ep;
> > > +	int ret = -ENOMEM;
> > > +
> > > +	ep = kzalloc(SIZEOF_TRACE_EPROBE(nargs), GFP_KERNEL);
> > > +	if (!ep)
> > > +		goto error;
> > > +	ep->event_name = kstrdup(sys_event, GFP_KERNEL);
> > > +	if (!ep->event_name)
> > > +		goto error;
> > > +	ep->event_system = kstrdup(sys_name, GFP_KERNEL);
> > > +	if (!ep->event_system)
> > > +		goto error;
> > > +
> > > +	ret = trace_probe_init(&ep->tp, event, group, false);
> > > +	if (ret < 0)
> > > +		goto error;
> > > +
> > > +	dyn_event_init(&ep->devent, &eprobe_dyn_event_ops);
> > > +	return ep;
> > > +error:
> > > +	trace_event_probe_cleanup(ep);
> > > +	return ERR_PTR(ret);
> > > +}
> > > +
> > > +static int trace_eprobe_tp_arg_find(struct trace_eprobe *ep, int i)  
> > 
> > I think 'trace_eprobe_tp_arg_update()' will be better name.
> > Also, 'if (ep->tp.args[i].code->op == FETCH_OP_TP_ARG)' check
> > is moved inside in this function for hiding inside of the tp.args.
> 
> Will update.
> 
> > 
> > > +{
> > > +	struct probe_arg *parg = &ep->tp.args[i];
> > > +	struct ftrace_event_field *field;
> > > +	struct list_head *head;
> > > +
> > > +	head = trace_get_fields(ep->event);
> > > +	list_for_each_entry(field, head, link) {
> > > +		if (!strcmp(parg->code->data, field->name)) {
> > > +			kfree(parg->code->data);
> > > +			parg->code->data = field;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +	kfree(parg->code->data);
> > > +	parg->code->data = NULL;
> > > +	return -ENOENT;
> > > +}
> > > +
> > > +static int kprobe_event_define_fields(struct trace_event_call *event_call)  
> > 
> > you meant 'eprobe_event_define_fields()' ? :)
> 
> Yes, and I'll also rename the subject of the patch. As we are adding
> probes to tracepoints not kprobes.
> 
> > 
> > > +{
> > > +	int ret;
> > > +	struct kprobe_trace_entry_head field;
> > > +	struct trace_probe *tp;
> > > +
> > > +	tp = trace_probe_primary_from_call(event_call);
> > > +	if (WARN_ON_ONCE(!tp))
> > > +		return -ENOENT;
> > > +
> > > +	DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);  
> > 
> > Would you really need this 'ip' field? I think you can record the
> 
> No we do not. Will remove.
> 
> > original event ID (call->event.type) instead of ip.
> 
> Good idea about the "id" instead of "ip".
> 
> > 
> > > +
> > > +	return traceprobe_define_arg_fields(event_call, sizeof(field), tp);
> > > +}
> > > +
> > > +static struct trace_event_fields eprobe_fields_array[] = {
> > > +	{ .type = TRACE_FUNCTION_TYPE,
> > > +	  .define_fields = kprobe_event_define_fields },  
> > 
> > Ditto.
> > 
> 
> Agreed.
> 
> > > +	{}
> > > +};
> > > +
> > > +/* Event entry printers */
> > > +static enum print_line_t
> > > +print_eprobe_event(struct trace_iterator *iter, int flags,
> > > +		   struct trace_event *event)
> > > +{
> > > +	struct kprobe_trace_entry_head *field;
> > > +	struct trace_seq *s = &iter->seq;
> > > +	struct trace_probe *tp;
> > > +
> > > +	field = (struct kprobe_trace_entry_head *)iter->ent;
> > > +	tp = trace_probe_primary_from_call(
> > > +		container_of(event, struct trace_event_call, event));
> > > +	if (WARN_ON_ONCE(!tp))
> > > +		goto out;
> > > +
> > > +	trace_seq_printf(s, "%s: (", trace_probe_name(tp));
> > > +
> > > +	if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
> > > +		goto out;  
> > 
> > Here, you can show the original event name from the event ID.
> 
> OK.
> 
> > 
> > > +
> > > +	trace_seq_putc(s, ')');
> > > +
> > > +	if (print_probe_args(s, tp->args, tp->nr_args,
> > > +			     (u8 *)&field[1], field) < 0)
> > > +		goto out;
> > > +
> > > +	trace_seq_putc(s, '\n');
> > > + out:
> > > +	return trace_handle_return(s);
> > > +}
> > > +
> > > +static unsigned long get_event_field(struct fetch_insn *code, void *rec)
> > > +{
> > > +	struct ftrace_event_field *field = code->data;
> > > +	unsigned long val;
> > > +	void *addr;
> > > +
> > > +	addr = rec + field->offset;
> > > +
> > > +	switch (field->size) {
> > > +	case 1:
> > > +		if (field->is_signed)
> > > +			val = *(char *)addr;
> > > +		else
> > > +			val = *(unsigned char *)addr;
> > > +		break;
> > > +	case 2:
> > > +		if (field->is_signed)
> > > +			val = *(short *)addr;
> > > +		else
> > > +			val = *(unsigned short *)addr;
> > > +		break;
> > > +	case 4:
> > > +		if (field->is_signed)
> > > +			val = *(int *)addr;
> > > +		else
> > > +			val = *(unsigned int *)addr;
> > > +		break;
> > > +	default:
> > > +		if (field->is_signed)
> > > +			val = *(long *)addr;
> > > +		else
> > > +			val = *(unsigned long *)addr;
> > > +		break;
> > > +	}
> > > +	return val;
> > > +}
> > > +
> > > +static int get_eprobe_size(struct trace_probe *tp, void *rec)
> > > +{
> > > +	struct probe_arg *arg;
> > > +	int i, len, ret = 0;
> > > +
> > > +	for (i = 0; i < tp->nr_args; i++) {
> > > +		arg = tp->args + i;
> > > +		if (unlikely(arg->dynamic)) {
> > > +			unsigned long val;
> > > +
> > > +			val = get_event_field(arg->code, rec);
> > > +			len = process_fetch_insn_bottom(arg->code + 1, val, NULL, NULL);
> > > +			if (len > 0)
> > > +				ret += len;
> > > +		}
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +/* Kprobe specific fetch functions */  
> > [...]
> > > +static nokprobe_inline int
> > > +probe_mem_read(void *dest, void *src, size_t size)
> > > +{
> > > +#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> > > +	if ((unsigned long)src < TASK_SIZE)
> > > +		return probe_mem_read_user(dest, src, size);
> > > +#endif
> > > +	return copy_from_kernel_nofault(dest, src, size);
> > > +}  
> > 
> > Hmm, these "fetch_args for kernel" APIs should finally unified with
> > kprobe events. But at this step, this is good.
> 
> Agreed. For later patches though.
> 
> > 
> > > +
> > > +/* eprobe handler */
> > > +static inline void
> > > +__eprobe_trace_func(struct eprobe_data *edata, void *rec)
> > > +{
> > > +	struct kprobe_trace_entry_head *entry;
> > > +	struct trace_event_call *call = trace_probe_event_call(&edata->ep->tp);
> > > +	struct trace_event_buffer fbuffer;
> > > +	int dsize;
> > > +
> > > +	WARN_ON(call != edata->file->event_call);
> > > +
> > > +	if (trace_trigger_soft_disabled(edata->file))
> > > +		return;
> > > +
> > > +	fbuffer.trace_ctx = tracing_gen_ctx();
> > > +	fbuffer.trace_file = edata->file;
> > > +
> > > +	dsize = get_eprobe_size(&edata->ep->tp, rec);
> > > +	fbuffer.regs = NULL;
> > > +
> > > +	fbuffer.event =
> > > +		trace_event_buffer_lock_reserve(&fbuffer.buffer, edata->file,
> > > +					call->event.type,
> > > +					sizeof(*entry) + edata->ep->tp.size + dsize,
> > > +					fbuffer.trace_ctx);
> > > +	if (!fbuffer.event)
> > > +		return;
> > > +
> > > +	entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
> > > +	entry->ip = 0;  
> > 
> > Here, you can trace edata->ep->event->event.type instead of 0.
> 
> OK.
> 
> > 
> > > +	store_trace_args(&entry[1], &edata->ep->tp, rec, sizeof(*entry), dsize);
> > > +
> > > +	trace_event_buffer_commit(&fbuffer);
> > > +}
> > > +
> > > +/*
> > > + * The event probe implementation uses event triggers to get access to
> > > + * the event it is attached to, but is not an actual trigger. The below
> > > + * functions are just stubs to fulfill what is needed to use the trigger
> > > + * infrastructure.
> > > + */  
> > 
> > OK, I got it. So eprobe is implemented on the trigger action framework.
> > 
> > [...]
> > > +
> > > +static int enable_eprobe(struct trace_eprobe *ep,
> > > +			 struct trace_event_file *eprobe_file)
> > > +{
> > > +	struct event_trigger_data *trigger;
> > > +	struct trace_event_file *file;
> > > +	struct trace_array *tr = eprobe_file->tr;
> > > +
> > > +	file = find_event_file(tr, ep->event_system, ep->event_name);
> > > +	if (!file)
> > > +		return -ENOENT;
> > > +	trigger = new_eprobe_trigger(ep, eprobe_file);
> > > +	if (IS_ERR(trigger))
> > > +		return PTR_ERR(trigger);
> > > +
> > > +	list_add_tail_rcu(&trigger->list, &file->triggers);
> > > +
> > > +	trace_event_trigger_enable_disable(file, 1);
> > > +	update_cond_flag(file);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static struct trace_event_functions eprobe_funcs = {
> > > +	.trace		= print_eprobe_event
> > > +};
> > > +
> > > +static int disable_eprobe(struct trace_eprobe *ep,
> > > +			  struct trace_array *tr)
> > > +{
> > > +	struct event_trigger_data *trigger;
> > > +	struct trace_event_file *file;
> > > +	struct eprobe_data *edata;
> > > +
> > > +	file = find_event_file(tr, ep->event_system, ep->event_name);
> > > +	if (!file)
> > > +		return -ENOENT;
> > > +
> > > +	list_for_each_entry(trigger, &file->triggers, list) {
> > > +		if (!(trigger->flags & EVENT_TRIGGER_FL_PROBE))
> > > +			continue;
> > > +		edata = trigger->private_data;
> > > +		if (edata->ep == ep)
> > > +			break;
> > > +	}
> > > +	if (list_entry_is_head(trigger, &file->triggers, list))
> > > +		return -ENODEV;
> > > +
> > > +	list_del_rcu(&trigger->list);
> > > +
> > > +	trace_event_trigger_enable_disable(file, 0);
> > > +	update_cond_flag(file);
> > > +	return 0;
> > > +}
> > > +
> > > +static int enable_trace_eprobe(struct trace_event_call *call,
> > > +			       struct trace_event_file *file)
> > > +{
> > > +	struct trace_probe *pos, *tp;
> > > +	struct trace_eprobe *ep;
> > > +	bool enabled;
> > > +	int ret = 0;
> > > +
> > > +	tp = trace_probe_primary_from_call(call);
> > > +	if (WARN_ON_ONCE(!tp))
> > > +		return -ENODEV;
> > > +	enabled = trace_probe_is_enabled(tp);
> > > +
> > > +	/* This also changes "enabled" state */
> > > +	if (file) {
> > > +		ret = trace_probe_add_file(tp, file);
> > > +		if (ret)
> > > +			return ret;
> > > +	} else
> > > +		trace_probe_set_flag(tp, TP_FLAG_PROFILE);
> > > +
> > > +	if (enabled)
> > > +		return 0;
> > > +
> > > +	list_for_each_entry(pos, trace_probe_probe_list(tp), list) {
> > > +		ep = container_of(pos, struct trace_eprobe, tp);
> > > +		ret = enable_eprobe(ep, file);
> > > +		if (ret)
> > > +			break;
> > > +		enabled = true;
> > > +	}
> > > +
> > > +	if (ret) {
> > > +		/* Failed to enable one of them. Roll back all */
> > > +		if (enabled)
> > > +			disable_eprobe(ep, file->tr);
> > > +		if (file)
> > > +			trace_probe_remove_file(tp, file);
> > > +		else
> > > +			trace_probe_clear_flag(tp, TP_FLAG_PROFILE);
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > > +static int disable_trace_eprobe(struct trace_event_call *call,
> > > +				struct trace_event_file *file)
> > > +{
> > > +	struct trace_probe *pos, *tp;
> > > +	struct trace_eprobe *ep;
> > > +
> > > +	tp = trace_probe_primary_from_call(call);
> > > +	if (WARN_ON_ONCE(!tp))
> > > +		return -ENODEV;
> > > +
> > > +	if (file) {
> > > +		if (!trace_probe_get_file_link(tp, file))
> > > +			return -ENOENT;
> > > +		if (!trace_probe_has_single_file(tp))
> > > +			goto out;
> > > +		trace_probe_clear_flag(tp, TP_FLAG_TRACE);
> > > +	} else
> > > +		trace_probe_clear_flag(tp, TP_FLAG_PROFILE);
> > > +
> > > +	if (!trace_probe_is_enabled(tp)) {
> > > +		list_for_each_entry(pos, trace_probe_probe_list(tp), list) {
> > > +			ep = container_of(pos, struct trace_eprobe, tp);
> > > +			disable_eprobe(ep, file->tr);
> > > +		}
> > > +	}
> > > +
> > > + out:
> > > +	if (file)
> > > +		/*
> > > +		 * Synchronization is done in below function. For perf event,
> > > +		 * file == NULL and perf_trace_event_unreg() calls
> > > +		 * tracepoint_synchronize_unregister() to ensure synchronize
> > > +		 * event. We don't need to care about it.
> > > +		 */
> > > +		trace_probe_remove_file(tp, file);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int eprobe_register(struct trace_event_call *event,
> > > +			   enum trace_reg type, void *data)
> > > +{
> > > +	struct trace_event_file *file = data;
> > > +
> > > +	switch (type) {
> > > +	case TRACE_REG_REGISTER:
> > > +		return enable_trace_eprobe(event, file);
> > > +	case TRACE_REG_UNREGISTER:
> > > +		return disable_trace_eprobe(event, file);
> > > +#ifdef CONFIG_PERF_EVENTS
> > > +	case TRACE_REG_PERF_REGISTER:
> > > +	case TRACE_REG_PERF_UNREGISTER:  
> > 
> > Doesn't this support perf? In that case, you can simplify enable_trace_eprobe()
> > and disable_trace_eprobe(), because 'file' is always not NULL.
> 
> It should support perf. We just haven't tested it yet ;-)

If you add an event probe, you can use that event from perf record etc.

> 
> Perhaps we'll add perf support as a separate patch.

Yes, that makes review easier. 

> 
> > 
> > > +	case TRACE_REG_PERF_OPEN:
> > > +	case TRACE_REG_PERF_CLOSE:
> > > +	case TRACE_REG_PERF_ADD:
> > > +	case TRACE_REG_PERF_DEL:
> > > +		return 0;
> > > +#endif
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > > +static inline void init_trace_eprobe_call(struct trace_eprobe *ep)
> > > +{
> > > +	struct trace_event_call *call = trace_probe_event_call(&ep->tp);
> > > +
> > > +	call->flags = TRACE_EVENT_FL_EPROBE;
> > > +	call->event.funcs = &eprobe_funcs;
> > > +	call->class->fields_array = eprobe_fields_array;
> > > +	call->class->reg = eprobe_register;
> > > +}
> > > +
> > > +static int __trace_eprobe_create(int argc, const char *argv[])
> > > +{
> > > +	/*
> > > +	 * Argument syntax:
> > > +	 *      e[:[GRP/]ENAME] SYSTEM.EVENT [FETCHARGS]  
> > 
> > Ah, OK. ENAME is also omittable.
> > 
> > 
> > > +	 * Fetch args:
> > > +	 *  <name>=$<field>[:TYPE]
> > > +	 */
> > > +	const char *event = NULL, *group = EPROBE_EVENT_SYSTEM;
> > > +	unsigned int flags = TPARG_FL_KERNEL | TPARG_FL_TPOINT;
> > > +	const char *sys_event = NULL, *sys_name = NULL;
> > > +	struct trace_eprobe *ep = NULL;
> > > +	char buf1[MAX_EVENT_NAME_LEN];
> > > +	char buf2[MAX_EVENT_NAME_LEN];
> > > +	char *tmp = NULL;
> > > +	int ret = 0;
> > > +	int i;
> > > +
> > > +	if (argc < 2)
> > > +		return -ECANCELED;
> > > +
> > > +	trace_probe_log_init("event_probe", argc, argv);
> > > +
> > > +	event = strchr(&argv[0][1], ':');
> > > +	if (event) {
> > > +		event++;
> > > +		ret = traceprobe_parse_event_name(&event, &group, buf1,
> > > +						  event - argv[0], '/');
> > > +		if (ret)
> > > +			goto parse_error;
> > > +	} else {
> > > +		strscpy(buf1, argv[1], MAX_EVENT_NAME_LEN);
> > > +		sanitize_event_name(buf1);
> > > +		event = buf1;
> > > +	}
> > > +	if (!is_good_name(event) || !is_good_name(group))
> > > +		goto parse_error;
> > > +
> > > +	sys_event = argv[1];
> > > +	ret = traceprobe_parse_event_name(&sys_event, &sys_name, buf2,
> > > +					  sys_event - argv[1], '.');
> > > +	if (ret || !sys_name)
> > > +		goto parse_error;
> > > +	if (!is_good_name(sys_event) || !is_good_name(sys_name))
> > > +		goto parse_error;
> > > +	ep = alloc_event_probe(group, event, sys_name, sys_event, argc - 2);
> > > +	if (IS_ERR(ep)) {
> > > +		ret = PTR_ERR(ep);
> > > +		/* This must return -ENOMEM, else there is a bug */
> > > +		WARN_ON_ONCE(ret != -ENOMEM);
> > > +		goto error;	/* We know ep is not allocated */
> > > +	}
> > > +	ret = trace_eprobe_find(ep);
> > > +	if (ret)
> > > +		goto error;  
> > 
> > As I said above, this can be called before "alloc_event_probe()".
> 
> I'll take a look at implementing that.

OK.

> 
> > 
> > > +
> > > +	argc -= 2; argv += 2;
> > > +	/* parse arguments */
> > > +	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
> > > +		tmp = kstrdup(argv[i], GFP_KERNEL);
> > > +		if (!tmp) {
> > > +			ret = -ENOMEM;
> > > +			goto error;
> > > +		}
> > > +		ret = traceprobe_parse_probe_arg(&ep->tp, i, tmp, flags);
> > > +		if (ret == -EINVAL)
> > > +			kfree(tmp);
> > > +		if (ret)
> > > +			goto error;	/* This can be -ENOMEM */
> > > +		if (ep->tp.args[i].code->op == FETCH_OP_TP_ARG) {
> > > +			ret = trace_eprobe_tp_arg_find(ep, i);  
> > 
> > Here, as I said above too, below code will be better encapsulated.
> > 
> >   /* (code->op check is done inside below function) */
> >  ret = trace_eprobe_tp_arg_update(ep, i);
> >  if (ret)
> >     goto error;
> 
> OK.
> 
> > 
> > > +			if (ret)
> > > +				goto error;
> > > +		}
> > > +	}
> > > +	ret = traceprobe_set_print_fmt(&ep->tp, false);
> > > +	if (ret < 0)
> > > +		goto error;
> > > +	init_trace_eprobe_call(ep);
> > > +	mutex_lock(&event_mutex);
> > > +	ret = trace_probe_register_event_call(&ep->tp);
> > > +	if (ret)
> > > +		goto out_unlock;
> > > +	ret = dyn_event_add(&ep->devent);
> > > +out_unlock:
> > > +	mutex_unlock(&event_mutex);
> > > +	return ret;
> > > +
> > > +parse_error:
> > > +	ret = -EINVAL;
> > > +error:
> > > +	trace_event_probe_cleanup(ep);
> > > +	return ret;
> > > +}
> > > +
> > > +/*
> > > + * Register dynevent at core_initcall. This allows kernel to setup eprobe
> > > + * events in postcore_initcall without tracefs.
> > > + */
> > > +static __init int trace_events_eprobe_init_early(void)
> > > +{
> > > +	int err = 0;
> > > +
> > > +	err = dyn_event_register(&eprobe_dyn_event_ops);
> > > +	if (err)
> > > +		pr_warn("Could not register eprobe_dyn_event_ops\n");
> > > +
> > > +	return err;
> > > +}
> > > +core_initcall(trace_events_eprobe_init_early);
> > > diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> > > index 949ef09dc537..45f5392fb35c 100644
> > > --- a/kernel/trace/trace_events_hist.c
> > > +++ b/kernel/trace/trace_events_hist.c
> > > @@ -66,7 +66,8 @@
> > >  	C(EMPTY_SORT_FIELD,	"Empty sort field"),			\
> > >  	C(TOO_MANY_SORT_FIELDS,	"Too many sort fields (Max = 2)"),	\
> > >  	C(INVALID_SORT_FIELD,	"Sort field must be a key or a val"),	\
> > > -	C(INVALID_STR_OPERAND,	"String type can not be an operand in expression"),
> > > +	C(INVALID_STR_OPERAND,	"String type can not be an operand in expression"),  \
> > > +	C(SYNTH_ON_EPROBE,	"Synthetic event on eprobe is not supported"),  
> > 
> > As I and Steve discussed, I think this can be allowed and loops should be
> > detected and avoided in the other way. Anyway, this part will be better in
> > the separated patch.
> 
> Agreed.
> 
> > [...]
> > 
> > > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> > > index 15413ad7cef2..5a97317e91fb 100644
> > > --- a/kernel/trace/trace_probe.c
> > > +++ b/kernel/trace/trace_probe.c
> > > @@ -227,12 +227,12 @@ int traceprobe_split_symbol_offset(char *symbol, long *offset)
> > >  
> > >  /* @buf must has MAX_EVENT_NAME_LEN size */
> > >  int traceprobe_parse_event_name(const char **pevent, const char **pgroup,
> > > -				char *buf, int offset)
> > > +				char *buf, int offset, int delim)
> > >  {
> > >  	const char *slash, *event = *pevent;
> > >  	int len;
> > >  
> > > -	slash = strchr(event, '/');
> > > +	slash = strchr(event, delim);  
> > 
> > As I pointed another mail, I'm OK to use both '/' and '.' as delimiter for
> > kprobes/uprobes too. But it must be a separated patch.
> 
> I'll update.
> 
> Thanks for taking the time with your review, it's been very valuable.

Thank you!

> 
> -- Steve


-- 
Masami Hiramatsu <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ