lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241218133126.3667d7b1@gandalf.local.home>
Date: Wed, 18 Dec 2024 13:31:26 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Daniel Borkmann
 <daniel@...earbox.net>, Florent Revest <revest@...gle.com>, LKML
 <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH] vsprintf: simplify number handling

On Wed, 18 Dec 2024 13:04:27 -0500
Steven Rostedt <rostedt@...dmis.org> wrote:

> > It's very much a part of the standard printf format, and is very much
> > inherent to the whole varargs and C integer promotion rules (ie you
> > literally *cannot* pass an actual 'char' value to a varargs function -
> > the normal C integer type extension rules apply).
> > 
> > So this is not really some odd kernel extension, and while there are
> > only a handful of users in tracing (that apparently trace-cmd cannot
> > deal with), it's not even _that_ uncommon in general:  
> 
> trace-cmd (and libtraceevent for that matter) does handle "%h" and "%hh"
> as well.
> 
> But the vbin_printf() which trace_printk() uses is a different beast, and
> requires rebuilding the arguments so that it can be parsed, and there "%h"
> isn't supported.

Just to state the difference between TP_printk() and trace_printk() is that
with trace events only the data is saved to the ring buffer. For example,
for the sched_waking trace event:

TRACE_EVENT(sched_waking,

	TP_PROTO(struct task_struct *p),

	TP_ARGS(__perf_task(p)),

	TP_STRUCT__entry(
		__array(	char,	comm,	TASK_COMM_LEN	)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	target_cpu		)
	),

	TP_fast_assign(
		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
		__entry->target_cpu	= task_cpu(p);
	),

	TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d",
		  __entry->comm, __entry->pid, __entry->prio,
		  __entry->target_cpu)
);

[ Note, I made this into a TRACE_EVENT() but in reality it's multiple
  events that uses DECLARE_EVENT_CLASS() and DEFINE_EVENT(), but the idea is
  still the same. ]

	TP_STRUCT__entry(
		__array(	char,	comm,	TASK_COMM_LEN	)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	target_cpu		)
	),

That turns into:

	struct trace_event_raw_sched_waking {
		struct trace_entry		ent;
		char				comm[TASK_COMM_LEN];
		pid_t				pid;
		int				prio;
		int				target_cpu;
		char				__data[];
	}

Then we have how to load that structure:

	TP_fast_assign(
		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
		__entry->target_cpu	= task_cpu(p);
	),
 
Where the "__entry" is of type struct trace_event_raw_sched_waking and
points into the reserved location in the ring buffer. This has the above
assignments write directly into the ring buffer and avoids any copying.

Now the "trace" file needs to know how to print it, that's where the
TP_printk() is. It is basically a vsprintf(TP_printk()) with the __entry
again pointing to the content of the ring buffer.

But trace_printk() is not a trace event and requires writing the format as
well as the data into the ring buffer when it is called. It use to simply
just use vsnprintf() but it was considered much faster to not do the
formatting during the record and to push it back to when it is read. As
trace_printk() is used specifically to find hard to hit bugs, to keep it
from causing "heisenbugs", using vbin_printf() proved to be much faster and
made trace_printk() less intrusive to debugging.

For historical analysis, here's where it was first introduced:

  https://lore.kernel.org/lkml/49aa0c73.1c07d00a.4fc6.ffffb4d7@mx.google.com/

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ