linux-kernel - Re: [PATCH v3] trace/pid_list: optimize pid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251113102445.3e70c1ec@gandalf.local.home>
Date: Thu, 13 Nov 2025 10:24:45 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Yongliang Gao <leonylgao@...il.com>, mhiramat@...nel.org,
 mathieu.desnoyers@...icios.com, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, frankjpliu@...cent.com, Yongliang Gao
 <leonylgao@...cent.com>, Huang Cun <cunhuang@...cent.com>
Subject: Re: [PATCH v3] trace/pid_list: optimize pid_list->lock contention

On Thu, 13 Nov 2025 16:17:29 +0100
Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> On 2025-11-13 10:05:24 [-0500], Steven Rostedt wrote:
> > This means that the chunks are not being freed and we can't be doing
> > synchronize_rcu() in every exit.  
> 
> You don't have to, you can do call_rcu().

But the chunk isn't being freed. They may be used right away.


> So if the kfree() is not an issue, it is just the use of the block from
> the freelist which must not point to a wrong item? And therefore the
> seqcount?

Correct.

> 
> > > So I *think* the RCU approach should be doable and cover this.  
> > 
> > Where would you put the synchronize_rcu()? In do_exit()?  
> 
> simply call_rcu() and let it move to the freelist.

A couple of issues. One, the chunks are fully used. There's no place to put
a "rcu_head" in them. Well, we may be able to make use of them.

Second, if there's a lot of tasks exiting and forking, we can easily run
out of chunks that are waiting to be "freed" via call_rcu().

> 
> > Also understanding what this is used for helps in understanding the scope
> > of protection needed.
> > 
> > The pid_list is created when you add anything into one of the pid files in
> > tracefs. Let's use /sys/kernel/tracing/set_ftrace_pid:
> > 
> >   # cd /sys/kernel/tracing
> >   # echo $$ > set_ftrace_pid
> >   # echo 1 > options/function-fork
> >   # cat set_ftrace_pid
> >   2716
> >   2936
> >   # cat set_ftrace_pid
> >   2716
> >   2945
> > 
> > What the above did was to create a pid_list for the function tracer. I
> > added the bash process pid using $$ (2716). Then when I cat the file, it
> > showed the pid for the bash process as well as the pid for the cat process,
> > as the cat process is a child of the bash process. The function-fork option
> > means to add any child process to the set_ftrace_pid if the parent is
> > already in the list. It also means to remove the pid if a process in the
> > list exits.  
> 
> This adding/ add-on-fork, removing and remove-on-exit is the only write
> side?

That and manual writes to the set_ftrace_pid file.

> > What we are protecting against is when one chunk is freed, but then
> > allocated again for a different set of PIDs. Where the reader has the chunk,
> > it was freed and re-allocated and the bit that is about to be checked
> > doesn't represent the bit it is checking for.  
> 
> This I assumed.
> And the kfree() at the end can not happen while there is still a reader?

Correct. That's done by the pid_list user:

In clear_ftrace_pids():

	/* Wait till all users are no longer using pid filtering */
	synchronize_rcu();

	if ((type & TRACE_PIDS) && pid_list)
		trace_pid_list_free(pid_list);

	if ((type & TRACE_NO_PIDS) && no_pid_list)
		trace_pid_list_free(no_pid_list);

-- Steve