linux-kernel - Re: [PATCH v2] tracing: Fix regression of filter waiting a long time on RCU synchronization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <7468e7a2-e5f9-498c-a985-a1584969719b@paulmck-laptop>
Date: Mon, 9 Jun 2025 09:09:35 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
	Linux trace kernel <linux-trace-kernel@...r.kernel.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	Andreas Ziegler <ziegler.andreas@...mens.com>,
	Felix MOESSBAUER <felix.moessbauer@...mens.com>,
	Julien Flot <julien.flot@...mens.com>
Subject: Re: [PATCH v2] tracing: Fix regression of filter waiting a long time
 on RCU synchronization

On Mon, Jun 09, 2025 at 11:39:55AM -0400, Steven Rostedt wrote:
> On Mon, 9 Jun 2025 05:21:19 -0700
> "Paul E. McKenney" <paulmck@...nel.org> wrote:
> 
> > > +static void free_filter_list(struct rcu_head *rhp)
> > > +{
> > > +	struct filter_head *filter_list = container_of(rhp, struct filter_head, rcu);
> > > +	struct filter_list *filter_item, *tmp;
> > > +
> > > +	list_for_each_entry_safe(filter_item, tmp, &filter_list->list, list) {
> > > +		__free_filter(filter_item->filter);
> > > +		list_del(&filter_item->list);
> > > +		kfree(filter_item);
> > > +	}  
> > 
> > If the series of synchronize_rcu() calls was taking 99 seconds,
> > we might well be going through the above loop quite a few times, all
> > within a bh-disabled region of code.  For example, the calls from both
> > filter_free_subsystem_filters() and process_system_preds() look like
> > they can queue a very large number of entries.
> 
> The iteration is for each event in a subsystem.
> 
> Seeing what that is:
> 
> ~# for sys in /sys/kernel/tracing/events/* ; do cnt=`ls -F $sys |grep '/$' | wc -l`; echo $cnt $sys; done | sort -n | tail
> 24 /sys/kernel/tracing/events/tcp
> 26 /sys/kernel/tracing/events/xen
> 28 /sys/kernel/tracing/events/sched
> 32 /sys/kernel/tracing/events/writeback
> 33 /sys/kernel/tracing/events/libata
> 34 /sys/kernel/tracing/events/irq_vectors
> 55 /sys/kernel/tracing/events/xhci-hcd
> 92 /sys/kernel/tracing/events/kvm
> 113 /sys/kernel/tracing/events/ext4
> 724 /sys/kernel/tracing/events/syscalls
> 
> So the most loops that can happen is 724 on the syscalls plus the system
> filter. And adding in printks to count the loops, I verified that:
> 
> [..]
> [  100.019209] LOOPED 4 times
> [  100.020033] LOOPED 5 times
> [  100.020791] LOOPED 8 times
> [  100.021583] LOOPED 10 times
> [  100.022239] LOOPED 2 times
> [  100.110515] LOOPED 725 times
> [  100.111520] LOOPED 4 times
> [  100.112889] LOOPED 25 times
> [  100.114416] LOOPED 6 times
> [  100.115950] LOOPED 4 times
> [  100.117557] LOOPED 7 times
> [..]
> 
> (724 event filters and one system event filter = 725 filters).
> 
> If you feel that's too big, we could move it to a work queue.

That does sound like a bit more than would be good.

> As more system calls get added, that number will grow. But it is limited to
> the number of system calls.
> 
> All the event systems are way fewer than that.
> 
> > 
> > So...
> > 
> > > +	kfree(filter_list);
> > > +}
> > > +
> > > +static void free_filter_list_tasks(struct rcu_head *rhp)
> > > +{
> > > +	call_rcu(rhp, free_filter_list);  
> > 
> > ...why not use queue_rcu_work() in place of call_rcu() here, thus avoiding
> > possible response-time issues?
> > 
> > Or is there something that I am missing that limits the length of the
> > list passed to free_filter_list()?
> 
> Just the number of events in a system.
> 
> I could switch to a work queue but I don't think it needs to go as a fix,
> do you? That is, it can wait till the next merge window. Or is this going
> to affect things now?

Thank youi, and the next merge window should be fine.

							Thanx, Paul