lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7468e7a2-e5f9-498c-a985-a1584969719b@paulmck-laptop>
Date: Mon, 9 Jun 2025 09:09:35 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
	Linux trace kernel <linux-trace-kernel@...r.kernel.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	Andreas Ziegler <ziegler.andreas@...mens.com>,
	Felix MOESSBAUER <felix.moessbauer@...mens.com>,
	Julien Flot <julien.flot@...mens.com>
Subject: Re: [PATCH v2] tracing: Fix regression of filter waiting a long time
 on RCU synchronization

On Mon, Jun 09, 2025 at 11:39:55AM -0400, Steven Rostedt wrote:
> On Mon, 9 Jun 2025 05:21:19 -0700
> "Paul E. McKenney" <paulmck@...nel.org> wrote:
> 
> > > +static void free_filter_list(struct rcu_head *rhp)
> > > +{
> > > +	struct filter_head *filter_list = container_of(rhp, struct filter_head, rcu);
> > > +	struct filter_list *filter_item, *tmp;
> > > +
> > > +	list_for_each_entry_safe(filter_item, tmp, &filter_list->list, list) {
> > > +		__free_filter(filter_item->filter);
> > > +		list_del(&filter_item->list);
> > > +		kfree(filter_item);
> > > +	}  
> > 
> > If the series of synchronize_rcu() calls was taking 99 seconds,
> > we might well be going through the above loop quite a few times, all
> > within a bh-disabled region of code.  For example, the calls from both
> > filter_free_subsystem_filters() and process_system_preds() look like
> > they can queue a very large number of entries.
> 
> The iteration is for each event in a subsystem.
> 
> Seeing what that is:
> 
> ~# for sys in /sys/kernel/tracing/events/* ; do cnt=`ls -F $sys |grep '/$' | wc -l`; echo $cnt $sys; done | sort -n | tail
> 24 /sys/kernel/tracing/events/tcp
> 26 /sys/kernel/tracing/events/xen
> 28 /sys/kernel/tracing/events/sched
> 32 /sys/kernel/tracing/events/writeback
> 33 /sys/kernel/tracing/events/libata
> 34 /sys/kernel/tracing/events/irq_vectors
> 55 /sys/kernel/tracing/events/xhci-hcd
> 92 /sys/kernel/tracing/events/kvm
> 113 /sys/kernel/tracing/events/ext4
> 724 /sys/kernel/tracing/events/syscalls
> 
> So the most loops that can happen is 724 on the syscalls plus the system
> filter. And adding in printks to count the loops, I verified that:
> 
> [..]
> [  100.019209] LOOPED 4 times
> [  100.020033] LOOPED 5 times
> [  100.020791] LOOPED 8 times
> [  100.021583] LOOPED 10 times
> [  100.022239] LOOPED 2 times
> [  100.110515] LOOPED 725 times
> [  100.111520] LOOPED 4 times
> [  100.112889] LOOPED 25 times
> [  100.114416] LOOPED 6 times
> [  100.115950] LOOPED 4 times
> [  100.117557] LOOPED 7 times
> [..]
> 
> (724 event filters and one system event filter = 725 filters).
> 
> If you feel that's too big, we could move it to a work queue.

That does sound like a bit more than would be good.

> As more system calls get added, that number will grow. But it is limited to
> the number of system calls.
> 
> All the event systems are way fewer than that.
> 
> > 
> > So...
> > 
> > > +	kfree(filter_list);
> > > +}
> > > +
> > > +static void free_filter_list_tasks(struct rcu_head *rhp)
> > > +{
> > > +	call_rcu(rhp, free_filter_list);  
> > 
> > ...why not use queue_rcu_work() in place of call_rcu() here, thus avoiding
> > possible response-time issues?
> > 
> > Or is there something that I am missing that limits the length of the
> > list passed to free_filter_list()?
> 
> Just the number of events in a system.
> 
> I could switch to a work queue but I don't think it needs to go as a fix,
> do you? That is, it can wait till the next merge window. Or is this going
> to affect things now?

Thank youi, and the next merge window should be fine.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ