[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250424090539.0O37K8vN@linutronix.de>
Date: Thu, 24 Apr 2025 11:05:39 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Benjamin Berg <benjamin@...solutions.net>,
linux-modules@...r.kernel.org, linux-kernel@...r.kernel.org,
johannes@...solutions.net, Daniel Gomez <da.gomez@...sung.com>,
Luis Chamberlain <mcgrof@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Petr Pavlu <petr.pavlu@...e.com>,
Sami Tolvanen <samitolvanen@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v3 15/28] module: Use RCU in all users of
__module_text_address().
On 2025-04-23 11:16:49 [-0700], Paul E. McKenney wrote:
> On Wed, Apr 23, 2025 at 05:17:31PM +0200, Benjamin Berg wrote:
> > Hi,
> >
> > On Wed, 2025-01-08 at 10:04 +0100, Sebastian Andrzej Siewior wrote:
> > > __module_text_address() can be invoked within a RCU section, there is no
> > > requirement to have preemption disabled.
> > >
> > > Replace the preempt_disable() section around __module_text_address()
> > > with RCU.
> >
> > Unfortunately, this patch causes a performance regression for us. The
> > trouble is that we enable kmemleak and run trace-cmd so a lot of stack
> > traces need to be collected. Obviously, we also have lockdep enabled.
> >
> > Now, combine this with the UML stack dumping code calling into
> > __kernel_text_address a lot[1] and it really has a relevant performance
> > impact. I saw the kernel spending 40% of its own CPU time just on the
> > lock in is_module_text_address.
> >
> > Maybe kernel_text_address should leave the RCU handling to the caller
> > and assume that the RCU read lock is already taken?
> >
> > Benjamin
> >
> > [1] The UM arch dump_stack function reads every "unsigned long" on the
> > stack and tests it using __kernel_text_address.
>
> Use of a single guard(rcu)() is regressing performance? Interesting and
> quite unexpected. That said, tiven the amount of debug you have enabled,
> I am not so sure that people are going to be all that excited about a
> further performance regression.
>
> But is this regression due to the cleanup hook that guard(rcu)()
> registers? If so, please feel free to try using rcu_read_lock()
> and rcu_read_unlock() instead. I would be surprised if this makes a
> difference, but then again, your initial regression report also comes
> as a surprise, so...
>
> Another way to reduce guard(rcu)() overhead is to build your kernel
> with CONFIG_PREEMPT_NONE=y. Not so good for real-time response, but
> then again, neither are your debug options.
The guard notation is not regression I guess it is just the plenty of
rcu_read_lock()/ unlock(). We had one regression which was "fixed" by
commit ee57ab5a32129 ("locking/lockdep: Disable KASAN instrumentation of lockdep.c").
My guess would be that this is a preemptible kernel and the preempt
disable/ enable is cheaper that the RCU version. So going back to a
non-preemtible kernel should "fix" it.
Looking at kernel_text_address(), is_bpf_text_address() has also a
RCU read section so probably subject to the same trouble. And
is_ftrace_trampoline() could be also converted to RCU which would
increase the trouble.
Improving the stack trace on UM or caching some of the most common one
might help. Not sure if disabling kmemleak for lockdep is possible/
makes a difference.
> Thanx, Paul
Sebastian
Powered by blists - more mailing lists