lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250730103328.40bb6da5@gandalf.local.home>
Date: Wed, 30 Jul 2025 10:33:28 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Linux trace kernel
 <linux-trace-kernel@...r.kernel.org>, bpf@...r.kernel.org, Mathieu
 Desnoyers <mathieu.desnoyers@...icios.com>, Mark Rutland
 <mark.rutland@....com>, Peter Zijlstra <peterz@...radead.org>, Namhyung Kim
 <namhyung@...nel.org>, Takaya Saeki <takayas@...gle.com>, Douglas Raillard
 <douglas.raillard@....com>, Tom Zanussi <zanussi@...nel.org>, Andrew Morton
 <akpm@...ux-foundation.org>, Thomas Gleixner <tglx@...utronix.de>, Ian
 Rogers <irogers@...gle.com>, aahringo@...hat.com
Subject: Re: [PATCH] tracing/probes: Allow use of BTF names to dereference
 pointers

On Wed, 30 Jul 2025 22:53:40 +0900
Masami Hiramatsu (Google) <mhiramat@...nel.org> wrote:


> > If BTF is in the kernel, it can be used to find this with names, where the
> > user doesn't need to find the actual offset:
> > 
> >  # echo 'f:cache kmem_cache_alloc_noprof size=+kmem_cache.size($arg1):u32' >> dynamic_events  
> 
> Great! This is something like a evolution of assembler to "symbolic"
> assembler. ;)

Yep!

> 
> > 
> > Instead of the "+0x18", it would have "+kmem_cache.size" where the format is:
> > 
> >   +STRUCT.MEMBER[.MEMBER[..]]  
> 
> Yeah, and using '.' delimiter looks nice to me.

I know I initially suggested using ':' but then it hit me that a structure
uses '.' and that made a lot more sense. Also, it made parsing easier as I
had a hack to get around the ':' parsing that extracted the "type"
(like :u64 or :string)

> 
> > 
> > The delimiter is '.' and the first item is the structure name. Then the
> > member of the structure to get the offset of. If that member is an
> > embedded structure, another '.MEMBER' may be added to get the offset of
> > its members with respect to the original value.
> > 
> >   "+kmem_cache.size($arg1)" is equivalent to:
> > 
> >   (*(struct kmem_cache *)$arg1).size
> > 
> > Anonymous structures are also handled:
> > 
> >   # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events  
> 
> So this only replaces the "offset" part. So we still need to use
> +OFFS() syntax for dereferencing the pointer.

Correct.


> > And produces:
> > 
> >        trace-cmd-1381    [002] ...1.  2082.676268: read: (filemap_readahead.isra.0+0x0/0x150) file="trace.dat"
> >   
> 
> OK, the desgin looks good to me. I have some comments below.

I'd expected as much ;-)

This is for the next merge window so we have plenty of time.

> >  
> > +#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
> > +
> > +static int find_member(const char *ptr, struct btf *btf,
> > +		       const struct btf_type **type, int level)
> > +{
> > +	const struct btf_member *member;
> > +	const struct btf_type *t = *type;
> > +	int i;
> > +
> > +	/* Max of 3 depth of anonymous structures */
> > +	if (level > 3)
> > +		return -1;  
> 
> Please return an error code, maybe this is -E2BIG?

OK.

I was thinking that perhaps we should update the error_log to handle these
cases too.

> 
> > +
> > +	for_each_member(i, t, member) {
> > +		const char *tname = btf_name_by_offset(btf, member->name_off);
> > +
> > +		if (strcmp(ptr, tname) == 0) {
> > +			*type = btf_type_by_id(btf, member->type);
> > +			return BITS_ROUNDDOWN_BYTES(member->offset);
> > +		}
> > +
> > +		/* Handle anonymous structures */
> > +		if (strlen(tname))
> > +			continue;
> > +
> > +		*type = btf_type_by_id(btf, member->type);
> > +		if (btf_type_is_struct(*type)) {
> > +			int offset = find_member(ptr, btf, type, level + 1);
> > +
> > +			if (offset < 0)
> > +				continue;
> > +
> > +			return offset + BITS_ROUNDDOWN_BYTES(member->offset);
> > +		}
> > +	}
> > +
> > +	return -1;  
> 
> 	return -ENOENT;

Ah. This return doesn't propagate up to the caller. It's a static function
and just returns "not found". Which means that the "-E2BIG" will not be used.

If we want the -E2BIG to be used as well, then we can return the result of
this function.


> 
> > +}
> > +
> > +/**
> > + * btf_find_offset - Find an offset of a member for a structure
> > + * @arg: A structure name followed by one or more members
> > + * @offset_p: A pointer to where to store the offset
> > + *
> > + * Will parse @arg with the expected format of: struct.member[[.member]..]
> > + * It is delimited by '.'. The first item must be a structure type.
> > + * The next are its members. If the member is also of a structure type it
> > + * another member may follow ".member".
> > + *
> > + * Note, @arg is modified but will be put back to what it was on return.
> > + *
> > + * Returns: 0 on success and -EINVAL if no '.' is present
> > + *    or -ENXIO if the structure or member is not found.
> > + *    Returns -EINVAL if BTF is not defined.
> > + *  On success, @offset_p will contain the offset of the member specified
> > + *    by @arg.
> > + */
> > +int btf_find_offset(char *arg, long *offset_p)
> > +{
> > +	const struct btf_type *t;
> > +	struct btf *btf;
> > +	long offset = 0;
> > +	char *ptr;
> > +	int ret;
> > +	s32 id;
> > +
> > +	ptr = strchr(arg, '.');
> > +	if (!ptr)
> > +		return -EINVAL;  
> 
> Instead of just returning error, can't we log an error for helping user?

Yeah, but that will require the caller of this to handle it. I rather not
have the probe error logging code put in this file.

That means the negative numbers will need to mean something so that the
trace probe logic can know what to report.

> 
> trace_probe_log_err(BYTE_OFFSET, ERROR_CODE);
> 
> The base offset is stored in ctx->offset, so you can use it.
> ERROR_CODE is defined in trace_probe.h. 
> 
> Maybe you can add something like
> 
> 	C(BAD_STRUCT_FMT,		"Symbolic offset must be +STRUCT.MEMBER format"),\
> 
> And for other cases, you can use
> 
> 	C(BAD_BTF_TID,		"Failed to get BTF type info."),\

OK, so the caller can handle this (see below).

> 
> > +
> > +	*ptr = '\0';
> > +
> > +	id = bpf_find_btf_id(arg, BTF_KIND_STRUCT, &btf);
> > +	if (id < 0)
> > +		goto error;
> > +
> > +	/* Get BTF_KIND_FUNC type */
> > +	t = btf_type_by_id(btf, id);
> > +
> > +	/* May allow more than one member, as long as they are structures */
> > +	do {
> > +		if (!t || !btf_type_is_struct(t))
> > +			goto error;
> > +
> > +		*ptr++ = '.';
> > +		arg = ptr;
> > +		ptr = strchr(ptr, '.');
> > +		if (ptr)
> > +			*ptr = '\0';
> > +
> > +		ret = find_member(arg, btf, &t, 0);
> > +		if (ret < 0)
> > +			goto error;
> > +
> > +		offset += ret;
> > +
> > +	} while (ptr);
> > +
> > +	*offset_p = offset;
> > +	return 0;
> > +
> > +error:
> > +	if (ptr)
> > +		*ptr = '.';
> > +	return -ENXIO;
> > +}
> > diff --git a/kernel/trace/trace_btf.h b/kernel/trace/trace_btf.h
> > index 4bc44bc261e6..7b0797a6050b 100644
> > --- a/kernel/trace/trace_btf.h
> > +++ b/kernel/trace/trace_btf.h
> > @@ -9,3 +9,13 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
> >  						const struct btf_type *type,
> >  						const char *member_name,
> >  						u32 *anon_offset);
> > +
> > +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> > +/* Will modify arg, but will put it back before returning. */
> > +int btf_find_offset(char *arg, long *offset);
> > +#else
> > +static inline int btf_find_offset(char *arg, long *offset)
> > +{  
> 
> Here also should use 
> 
> 	C(NOSUP_BTFARG,		"BTF is not available or not supported"),	\
> 

Again, this should be from the caller.

> 
> Thank you,
> 
> > +	return -EINVAL;

So this can return -ENODEV;


> > +}
> > +#endif
> > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> > index 424751cdf31f..4c13e51ea481 100644
> > --- a/kernel/trace/trace_probe.c
> > +++ b/kernel/trace/trace_probe.c
> > @@ -1137,7 +1137,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> >  
> >  	case '+':	/* deref memory */
> >  	case '-':
> > -		if (arg[1] == 'u') {
> > +		if (arg[1] == 'u' && isdigit(arg[2])) {
> >  			deref = FETCH_OP_UDEREF;
> >  			arg[1] = arg[0];
> >  			arg++;
> > @@ -1150,7 +1150,10 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> >  			return -EINVAL;
> >  		}
> >  		*tmp = '\0';
> > -		ret = kstrtol(arg, 0, &offset);
> > +		if (arg[0] != '-' && !isdigit(*arg))
> > +			ret = btf_find_offset(arg, &offset);

Here we could have:

			if (ret < 0) {
				int err = 0;
				switch (ret) {
				case -ENODEV: err = NOSUP_BTFARG; break;
				case -E2BIG: err = MEMBER_TOO_DEEP; break;
				case -EINVAL: err = BAD_STRUCT_FMT; break;
				case -ENXIO: err = BAD_BTF_TID; break;
				}
				if (err)
					trace_probe_log_err(ctx->offset, err);
				return ret;
			}

-- Steve

			


> > +		else
> > +			ret = kstrtol(arg, 0, &offset);
> >  		if (ret) {
> >  			trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
> >  			break;
> > -- 
> > 2.47.2
> >   
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ