[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <526b12e4-4bb0-47b1-bece-66b47bfc0a92@paulmck-laptop>
Date: Thu, 18 Jan 2024 06:41:04 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Chen Zhongjin <chenzhongjin@...wei.com>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, yangjihong1@...wei.com,
naveen.n.rao@...ux.ibm.com, anil.s.keshavamurthy@...el.com,
davem@...emloft.net, mhiramat@...nel.org, tglx@...utronix.de,
peterz@...radead.org, pmladek@...e.com, dianders@...omium.org,
npiggin@...il.com, mpe@...erman.id.au, jkl820.git@...il.com,
juerg.haefliger@...onical.com, rick.p.edgecombe@...el.com,
eric.devolder@...cle.com, mic@...ikod.net
Subject: Re: [PATCH] kprobes: Use synchronize_rcu_tasks_rude in
kprobe_optimizer
On Wed, Jan 17, 2024 at 12:31:33PM -0800, Andrew Morton wrote:
> On Wed, 17 Jan 2024 06:16:36 +0000 Chen Zhongjin <chenzhongjin@...wei.com> wrote:
>
> > There is a deadlock scenario in kprobe_optimizer():
> >
> > pid A pid B pid C
> > kprobe_optimizer() do_exit() perf_kprobe_init()
> > mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex)
> > synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex
> > // waiting tasks_rcu_exit_srcu kernel_wait4()
> > // waiting pid C exit
> >
> > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in kprobe_optimizer()
> > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also promise
> > that all preempted tasks have scheduled, but it will not wait tasks_rcu_exit_srcu.
Hello, Chen Zhongjin,
Relying on the non-preemptability of the last portion of the do_exit()
function does appear to be the basis for a much better solution than
what we currently have. So at the very least, thank you for the idea!!!
I feel a bit silly for not having thought of it myself. ;-)
However, just invoking synchronize_rcu_tasks_rude() will be bad for both
energy efficiency and real-time response. This is due to the fact that
synchronize_rcu_tasks_rude() sends an IPI to each and every online CPUs,
almost none of which will be in the non-preemptible tail of do_exit()
at any given time. These additional unnecessary IPIs will drain battery
when they hit idle CPUs and degrade real-time response when they hit
CPUs running aggressive real-time applications. Which might not make
people happy.
So, would you be willing to use RCU's do_exit() hooks and RCU's hook
into the scheduler (either rcu_note_context_switch() or rcu_tasks_qs(),
whichever would work better) maintain a per-CPU variable that is a
pointer to the task in the non-preemptible tail of do_exit() if there
is one or NULL otherwise? This would get us the deadlock-avoidance
simplicity of your underlying idea, while avoiding (almost all of)
the energy-efficiency and real-time-response issues with your patch.
This does require a bit of memory-ordering care, so if you would prefer
that I do the implementation, just let me know.
(The memory-ordering trick is to use "smp_mb(); smp_load_acquire();" to
sample the counter and "smp_store_release(); smp_mb();" to update it.)
Thanx, Paul
> > Signed-off-by: Chen Zhongjin <chenzhongjin@...wei.com>
>
> Thanks. Should we backport this fix into earlier kernels? If so, are
> we able to identify a suitable Fixes: target?
Powered by blists - more mailing lists