linux-kernel - Re: [PATCH] kprobes: Use synchronize_rcu_tasks_rude in kprobe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <526b12e4-4bb0-47b1-bece-66b47bfc0a92@paulmck-laptop>
Date: Thu, 18 Jan 2024 06:41:04 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Chen Zhongjin <chenzhongjin@...wei.com>, linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org, yangjihong1@...wei.com,
	naveen.n.rao@...ux.ibm.com, anil.s.keshavamurthy@...el.com,
	davem@...emloft.net, mhiramat@...nel.org, tglx@...utronix.de,
	peterz@...radead.org, pmladek@...e.com, dianders@...omium.org,
	npiggin@...il.com, mpe@...erman.id.au, jkl820.git@...il.com,
	juerg.haefliger@...onical.com, rick.p.edgecombe@...el.com,
	eric.devolder@...cle.com, mic@...ikod.net
Subject: Re: [PATCH] kprobes: Use synchronize_rcu_tasks_rude in
 kprobe_optimizer

On Wed, Jan 17, 2024 at 12:31:33PM -0800, Andrew Morton wrote:
> On Wed, 17 Jan 2024 06:16:36 +0000 Chen Zhongjin <chenzhongjin@...wei.com> wrote:
> 
> > There is a deadlock scenario in kprobe_optimizer():
> > 
> > pid A				pid B			pid C
> > kprobe_optimizer()		do_exit()		perf_kprobe_init()
> > mutex_lock(&kprobe_mutex)	exit_tasks_rcu_start()	mutex_lock(&kprobe_mutex)
> > synchronize_rcu_tasks()		zap_pid_ns_processes()	// waiting kprobe_mutex
> > // waiting tasks_rcu_exit_srcu	kernel_wait4()
> > 				// waiting pid C exit
> > 
> > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in kprobe_optimizer()
> > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also promise
> > that all preempted tasks have scheduled, but it will not wait tasks_rcu_exit_srcu.

Hello, Chen Zhongjin,

Relying on the non-preemptability of the last portion of the do_exit()
function does appear to be the basis for a much better solution than
what we currently have.  So at the very least, thank you for the idea!!!
I feel a bit silly for not having thought of it myself.  ;-)

However, just invoking synchronize_rcu_tasks_rude() will be bad for both
energy efficiency and real-time response.  This is due to the fact that
synchronize_rcu_tasks_rude() sends an IPI to each and every online CPUs,
almost none of which will be in the non-preemptible tail of do_exit()
at any given time.  These additional unnecessary IPIs will drain battery
when they hit idle CPUs and degrade real-time response when they hit
CPUs running aggressive real-time applications.  Which might not make
people happy.

So, would you be willing to use RCU's do_exit() hooks and RCU's hook
into the scheduler (either rcu_note_context_switch() or rcu_tasks_qs(),
whichever would work better) maintain a per-CPU variable that is a
pointer to the task in the non-preemptible tail of do_exit() if there
is one or NULL otherwise?  This would get us the deadlock-avoidance
simplicity of your underlying idea, while avoiding (almost all of)
the energy-efficiency and real-time-response issues with your patch.

This does require a bit of memory-ordering care, so if you would prefer
that I do the implementation, just let me know.

(The memory-ordering trick is to use "smp_mb(); smp_load_acquire();" to
sample the counter and "smp_store_release(); smp_mb();" to update it.)

							Thanx, Paul

> > Signed-off-by: Chen Zhongjin <chenzhongjin@...wei.com>
> 
> Thanks.  Should we backport this fix into earlier kernels?  If so, are
> we able to identify a suitable Fixes: target?