[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250609113955.47a2f44c@gandalf.local.home>
Date: Mon, 9 Jun 2025 11:39:55 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Linux trace kernel
<linux-trace-kernel@...r.kernel.org>, Masami Hiramatsu
<mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Jan Kiszka <jan.kiszka@...mens.com>, Andreas Ziegler
<ziegler.andreas@...mens.com>, Felix MOESSBAUER
<felix.moessbauer@...mens.com>, Julien Flot <julien.flot@...mens.com>
Subject: Re: [PATCH v2] tracing: Fix regression of filter waiting a long
time on RCU synchronization
On Mon, 9 Jun 2025 05:21:19 -0700
"Paul E. McKenney" <paulmck@...nel.org> wrote:
> > +static void free_filter_list(struct rcu_head *rhp)
> > +{
> > + struct filter_head *filter_list = container_of(rhp, struct filter_head, rcu);
> > + struct filter_list *filter_item, *tmp;
> > +
> > + list_for_each_entry_safe(filter_item, tmp, &filter_list->list, list) {
> > + __free_filter(filter_item->filter);
> > + list_del(&filter_item->list);
> > + kfree(filter_item);
> > + }
>
> If the series of synchronize_rcu() calls was taking 99 seconds,
> we might well be going through the above loop quite a few times, all
> within a bh-disabled region of code. For example, the calls from both
> filter_free_subsystem_filters() and process_system_preds() look like
> they can queue a very large number of entries.
The iteration is for each event in a subsystem.
Seeing what that is:
~# for sys in /sys/kernel/tracing/events/* ; do cnt=`ls -F $sys |grep '/$' | wc -l`; echo $cnt $sys; done | sort -n | tail
24 /sys/kernel/tracing/events/tcp
26 /sys/kernel/tracing/events/xen
28 /sys/kernel/tracing/events/sched
32 /sys/kernel/tracing/events/writeback
33 /sys/kernel/tracing/events/libata
34 /sys/kernel/tracing/events/irq_vectors
55 /sys/kernel/tracing/events/xhci-hcd
92 /sys/kernel/tracing/events/kvm
113 /sys/kernel/tracing/events/ext4
724 /sys/kernel/tracing/events/syscalls
So the most loops that can happen is 724 on the syscalls plus the system
filter. And adding in printks to count the loops, I verified that:
[..]
[ 100.019209] LOOPED 4 times
[ 100.020033] LOOPED 5 times
[ 100.020791] LOOPED 8 times
[ 100.021583] LOOPED 10 times
[ 100.022239] LOOPED 2 times
[ 100.110515] LOOPED 725 times
[ 100.111520] LOOPED 4 times
[ 100.112889] LOOPED 25 times
[ 100.114416] LOOPED 6 times
[ 100.115950] LOOPED 4 times
[ 100.117557] LOOPED 7 times
[..]
(724 event filters and one system event filter = 725 filters).
If you feel that's too big, we could move it to a work queue.
As more system calls get added, that number will grow. But it is limited to
the number of system calls.
All the event systems are way fewer than that.
>
> So...
>
> > + kfree(filter_list);
> > +}
> > +
> > +static void free_filter_list_tasks(struct rcu_head *rhp)
> > +{
> > + call_rcu(rhp, free_filter_list);
>
> ...why not use queue_rcu_work() in place of call_rcu() here, thus avoiding
> possible response-time issues?
>
> Or is there something that I am missing that limits the length of the
> list passed to free_filter_list()?
Just the number of events in a system.
I could switch to a work queue but I don't think it needs to go as a fix,
do you? That is, it can wait till the next merge window. Or is this going
to affect things now?
-- Steve
Powered by blists - more mailing lists