linux-kernel - Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200409213806.7657ec27d1b5cbd8243505b9@kernel.org>
Date:   Thu, 9 Apr 2020 21:38:06 +0900
From:   Masami Hiramatsu <mhiramat@...nel.org>
To:     Jiri Olsa <jolsa@...nel.org>
Cc:     "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
        Anil S Keshavamurthy <anil.s.keshavamurthy@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Peter Zijlstra <peterz@...radead.org>,
        lkml <linux-kernel@...r.kernel.org>,
        "bibo,mao" <bibo.mao@...el.com>,
        "Ziqian SUN (Zamir)" <zsun@...hat.com>
Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within
 kprobe_flush_task

Hi Jiri,

On Wed,  8 Apr 2020 18:46:41 +0200
Jiri Olsa <jolsa@...nel.org> wrote:

> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.

Hmm, kprobe is lockless, but kretprobe involves spinlock.
Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
for kretprobe.
If you need to trace spinlock return, please consider to putting
kprobe at "ret" instruction.

> My test was also able to trigger lockdep output:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  5.6.0-rc6+ #6 Not tainted
>  --------------------------------------------
>  sched-messaging/2767 is trying to acquire lock:
>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> 
>  but task is already holding lock:
>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&(kretprobe_table_locks[i].lock));
>    lock(&(kretprobe_table_locks[i].lock));
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  1 lock held by sched-messaging/2767:
>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  stack backtrace:
>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>  Call Trace:
>   dump_stack+0x96/0xe0
>   __lock_acquire.cold.57+0x173/0x2b7
>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>   ? lockdep_hardirqs_on+0x590/0x590
>   ? __lock_acquire+0xf63/0x4030
>   lock_acquire+0x15a/0x3d0
>   ? kretprobe_hash_lock+0x52/0xa0
>   _raw_spin_lock_irqsave+0x36/0x70
>   ? kretprobe_hash_lock+0x52/0xa0
>   kretprobe_hash_lock+0x52/0xa0
>   trampoline_handler+0xf8/0x940
>   ? kprobe_fault_handler+0x380/0x380
>   ? find_held_lock+0x3a/0x1c0
>   kretprobe_trampoline+0x25/0x50
>   ? lock_acquired+0x392/0xbc0
>   ? _raw_spin_lock_irqsave+0x50/0x70
>   ? __get_valid_kprobe+0x1f0/0x1f0
>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
>   ? finish_task_switch+0x4b9/0x6d0
>   ? __switch_to_asm+0x34/0x70
>   ? __switch_to_asm+0x40/0x70
> 
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> 
> The problem is in outside kprobe_flush_task, where we call:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave
> 
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

Hmm, OK. In this case, I think we should mark this process is
going to die and never try to kretprobe on it.

> 
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> 
>         ---> kretprobe_table_locks locked
> 
>         kretprobe_trampoline
>           trampoline_handler
>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> 
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.
> 
> I'm not sure this is the best fix, any ideas are welcome ;-)

Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
kprobes (and kretprobe) on an area by filling current_kprobe might
be a good idea, but it also involves other kprobes.

How about let kretprobe skip the task which state == TASK_DEAD ?

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 627fc1b7011a..3f207d2e0afb 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 	 * To avoid deadlocks, prohibit return probing in NMI contexts,
 	 * just skip the probe and increase the (inexact) 'nmissed'
 	 * statistical counter, so that the user is informed that
-	 * something happened:
+	 * something happened.
+	 * Also, if the current task is dead, we will already in the process
+	 * to reclaim kretprobe instances from hash list. To avoid memory
+	 * leak, skip to run the kretprobe on such task.
 	 */
-	if (unlikely(in_nmi())) {
+	if (unlikely(in_nmi()) || current->state == TASK_DEAD) {
 		rp->nmissed++;
 		return 0;
 	}

-- 
Masami Hiramatsu <mhiramat@...nel.org>