linux-kernel - Re: [PATCH v2] tracing: Add system trigger file to enable triggers for all the system's events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251126093920.30763e89@gandalf.local.home>
Date: Wed, 26 Nov 2025 09:39:20 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Linux Trace Kernel
 <linux-trace-kernel@...r.kernel.org>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>, Tom Zanussi <zanussi@...nel.org>
Subject: Re: [PATCH v2] tracing: Add system trigger file to enable triggers
 for all the system's events

On Wed, 26 Nov 2025 15:02:51 +0900
Masami Hiramatsu (Google) <mhiramat@...nel.org> wrote:

> > This also allows to remove a trigger from all events in a subsystem (even
> > if it's not a subsystem trigger!).
> >   
> 
> I have some comments below.

BTW, it's more appropriate to simply trim the email ;-)


> > --- a/kernel/trace/trace_events.c
> > +++ b/kernel/trace/trace_events.c
> > @@ -2168,51 +2168,52 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
> >  
> >  static LIST_HEAD(event_subsystems);
> >  
> > -static int subsystem_open(struct inode *inode, struct file *filp)
> > +struct trace_subsystem_dir *trace_get_system_dir(struct inode *inode)
> >  {
> > -	struct trace_subsystem_dir *dir = NULL, *iter_dir;
> > -	struct trace_array *tr = NULL, *iter_tr;
> > -	struct event_subsystem *system = NULL;
> > -	int ret;
> > +	struct trace_subsystem_dir *dir;
> > +	struct trace_array *tr = NULL;  
> 
> nit: This also no need to be initialized.

Hmm, I guess this was needed in one of the versions I had before posting.

I'll fix in v3.

> 
> >  
> > -	if (tracing_is_disabled())
> > -		return -ENODEV;
> > +	guard(mutex)(&event_mutex);
> > +	guard(mutex)(&trace_types_lock);
> >  
> >  	/* Make sure the system still exists */
> > -	mutex_lock(&event_mutex);
> > -	mutex_lock(&trace_types_lock);
> > -	list_for_each_entry(iter_tr, &ftrace_trace_arrays, list) {
> > -		list_for_each_entry(iter_dir, &iter_tr->systems, list) {
> > -			if (iter_dir == inode->i_private) {
> > +	list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> > +		list_for_each_entry(dir, &tr->systems, list) {
> > +			if (dir == inode->i_private) {
> >  				/* Don't open systems with no events */
> > -				tr = iter_tr;
> > -				dir = iter_dir;
> > -				if (dir->nr_events) {
> > -					__get_system_dir(dir);
> > -					system = dir->subsystem;
> > -				}
> > -				goto exit_loop;
> > +				if (!dir->nr_events)
> > +					return NULL;
> > +				if (__trace_array_get(tr) < 0)
> > +					return NULL;
> > +				__get_system_dir(dir);
> > +				return dir;
> >  			}
> >  		}
> >  	}


> >  static ssize_t event_trigger_regex_write(struct file *file,
> >  					 const char __user *ubuf,
> >  					 size_t cnt, loff_t *ppos)
> >  {
> >  	struct trace_event_file *event_file;
> >  	ssize_t ret;
> > -	char *buf __free(kfree) = NULL;
> > +	char *buf __free(kfree) = get_user_buf(ubuf, cnt);
> >  
> > -	if (!cnt)
> > +	if (!buf)
> >  		return 0;
> >  
> > -	if (cnt >= PAGE_SIZE)
> > -		return -EINVAL;
> > -
> > -	buf = memdup_user_nul(ubuf, cnt);
> >  	if (IS_ERR(buf))
> >  		return PTR_ERR(buf);  
> 
> You can simply write:
> 
> 	if (IS_ERR_OR_NULL(buf))
> 		return PTR_ERR_OR_ZERO(buf);

Yes I can. But honestly, the above is much harder to understand what is
happening than the code I had written.

I mean:

	if (!buf)
		return 0;

	if (IS_ERR(buf))
		return PTR_ERR(buf);

is pretty obvious of what is happening.

	if (IS_ERR_OR_NULL(buf))
		return PTR_ERR_OR_ZERO(buf);

Is quite a bit more obfuscated. I mean, I needed to look up what
PTR_ERR_OR_ZERO() did to be sure I knew what was returned.

> > +	list_for_each_entry(file, &tr->events, list) {
> > +
> > +		if (strcmp(system->name, file->event_call->class->system) != 0)
> > +			continue;
> > +
> > +		ret = p->parse(p, file, buff, command, next);
> > +
> > +		/* Removals and existing events do not error */
> > +		if (ret < 0 && ret != -EEXIST && !remove) {
> > +			pr_warn("Failed adding trigger %s on %s\n",
> > +				command, trace_event_name(file->event_call));
> > +		}  
> 
> 
> Can I expect that this can recover the previous settings
> via event trigger?
> e.g. 
> 
> # echo "stacktrace" > events/sched/sched_wakeup/trigger
> # echo "stacktrace" > events/sched/trigger
> # echo "!stacktrace" > events/sched/trigger
> # cat events/sched/sched_wakeup/trigger
> stacktrace:unlimited
> 
> ?

No. In fact, this is one of the features of the system trigger. Writing
into the system/trigger file is the same as writing into each of the
system's event's trigger files one at a time. In fact, I updated the
documentation in this patch to show that this file can be used to clear
tiggers too!

+       echo snapshot > events/sched/sched_waking/trigger
+       cat events/sched/sched_waking/trigger
+       snapshot:unlimited
+       echo '!snapshot' > events/sched/trigger
+       cat events/sched/sched_waking/trigger
+       # Available triggers:
+       # traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist


> 
> 
> > +	}
> > +	return 0;
> > +}
> > +
> > +static ssize_t
> > +event_system_trigger_write(struct file *filp, const char __user *ubuf,
> > +		    size_t cnt, loff_t *ppos)
> > +{
> > +	struct trace_subsystem_dir *dir = filp->private_data;
> > +	struct event_command *p;
> > +	char *command, *next;
> > +	char *buf __free(kfree) = get_user_buf(ubuf, cnt);
> > +	bool remove = false;
> > +	bool found = false;
> > +	ssize_t ret;
> > +
> > +	if (!buf)
> > +		return 0;
> > +
> > +	if (IS_ERR(buf))
> > +		return PTR_ERR(buf);  
> 
> Ditto.

And ditto again with my reply ;-)

> 
> > +
> > +	/* system triggers are not allowed to have counters */
> > +	if (strchr(buf, ':'))
> > +		return -EINVAL;  
> 
> ':' is not always used for counters (e.g. hist) and it seems odd
> to check anything about parse here. Can we do this counter check
> after parse a command?
> 
> > +
> > +	/* If opened for read too, dir is in the seq_file descriptor */
> > +	if (filp->f_mode & FMODE_READ) {
> > +		struct seq_file *m = filp->private_data;
> > +		dir = m->private;
> > +	}
> > +
> > +	/* Skip added space at beginning of buf */
> > +	next = strim(buf);
> > +
> > +	command = strsep(&next, " \t");
> > +	if (next) {
> > +		next = skip_spaces(next);
> > +		if (!*next)
> > +			next = NULL;
> > +	}  
> 
> strim() removes both leading and trailing whitespace. So this check is
> not required.

But next here is not the one that had strim() attached to it.

	command = strsep(&next, " \t");

Updates the content of next.

Thanks for the review,

-- Steve