linux-kernel - Re: [PATCH v3] trace/pid_list: optimize pid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251113105106.4270f6ac@gandalf.local.home>
Date: Thu, 13 Nov 2025 10:51:06 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Yongliang Gao <leonylgao@...il.com>, mhiramat@...nel.org,
 mathieu.desnoyers@...icios.com, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, frankjpliu@...cent.com, Yongliang Gao
 <leonylgao@...cent.com>, Huang Cun <cunhuang@...cent.com>
Subject: Re: [PATCH v3] trace/pid_list: optimize pid_list->lock contention

On Thu, 13 Nov 2025 16:35:08 +0100
Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> > A couple of issues. One, the chunks are fully used. There's no place to put
> > a "rcu_head" in them. Well, we may be able to make use of them.  
> 
> This could be the first (16?) bytes of the memory chunk.

Yes, but this will make things more complex.

> 
> > Second, if there's a lot of tasks exiting and forking, we can easily run
> > out of chunks that are waiting to be "freed" via call_rcu().  
> 
> but this is a general RCU problem and not new here. The task_struct and
> everything around it (including stack) is RCU freed.

But this doesn't happen under memory pressure. Allocations are not
performed if a chunk is not available. If a chunk isn't available, then we
just stop tracing the filtered task. Hmm, it currently silently fails.
Perhaps I need to add a printk when this happens.

> > > This adding/ add-on-fork, removing and remove-on-exit is the only write
> > > side?  
> > 
> > That and manual writes to the set_ftrace_pid file.  
> 
> This looks like minimal. I miss understood then that context switch can
> also contribute to it.

sched_switch is only a reader, not a writer.

> And the callers of trace_pid_list_is_set() are always in the RCU read
> section then? I assume so, since it wouldn't make sense otherwise.

Yes, because they are only tested in sched_switch and fork and exit tracepoints.

Although, this was written when tracepoint callbacks were always called
under preempt disable. Perhaps we need to change that call to:

	tracepoint_synchronize_unregister()

Or add a preempt_disable() around the callers.

I'm very nervous about using RCU here. It will add a lot more corner cases
that needs to be accounted for. The complexity doesn't appear to be worth
it. I'd rather just keep the raw spin locks than to convert it to RCU.

The seqcount makes sense to me. It's simple and keeps the same paradigm as
what we have. What's wrong with using it?

-- Steve