[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1586437540.j6vekko069.naveen@linux.ibm.com>
Date: Thu, 09 Apr 2020 18:46:47 +0530
From: "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>
To: Jiri Olsa <jolsa@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@...el.com>,
"bibo,mao" <bibo.mao@...el.com>,
"David S. Miller" <davem@...emloft.net>,
lkml <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
"Ziqian SUN (Zamir)" <zsun@...hat.com>
Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within
kprobe_flush_task
Hi Masami,
Masami Hiramatsu wrote:
> Hi Jiri,
>
> On Wed, 8 Apr 2020 18:46:41 +0200
> Jiri Olsa <jolsa@...nel.org> wrote:
>
>> hi,
>> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
>
> Hmm, kprobe is lockless, but kretprobe involves spinlock.
> Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> for kretprobe.
As far as I can see, this is the only place where probing
_raw_spin_lock_irqsave() is an issue. Should we blacklist all users for
this case alone?
> If you need to trace spinlock return, please consider to putting
> kprobe at "ret" instruction.
>
>> My test was also able to trigger lockdep output:
>>
>> ============================================
>> WARNING: possible recursive locking detected
>> 5.6.0-rc6+ #6 Not tainted
>> --------------------------------------------
>> sched-messaging/2767 is trying to acquire lock:
>> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>>
>> but task is already holding lock:
>> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock(&(kretprobe_table_locks[i].lock));
>> lock(&(kretprobe_table_locks[i].lock));
>>
>> *** DEADLOCK ***
>>
>> May be due to missing lock nesting notation
>>
>> 1 lock held by sched-messaging/2767:
>> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>>
>> stack backtrace:
>> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>> Call Trace:
>> dump_stack+0x96/0xe0
>> __lock_acquire.cold.57+0x173/0x2b7
>> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>> ? lockdep_hardirqs_on+0x590/0x590
>> ? __lock_acquire+0xf63/0x4030
>> lock_acquire+0x15a/0x3d0
>> ? kretprobe_hash_lock+0x52/0xa0
>> _raw_spin_lock_irqsave+0x36/0x70
>> ? kretprobe_hash_lock+0x52/0xa0
>> kretprobe_hash_lock+0x52/0xa0
>> trampoline_handler+0xf8/0x940
>> ? kprobe_fault_handler+0x380/0x380
>> ? find_held_lock+0x3a/0x1c0
>> kretprobe_trampoline+0x25/0x50
>> ? lock_acquired+0x392/0xbc0
>> ? _raw_spin_lock_irqsave+0x50/0x70
>> ? __get_valid_kprobe+0x1f0/0x1f0
>> ? _raw_spin_unlock_irqrestore+0x3b/0x40
>> ? finish_task_switch+0x4b9/0x6d0
>> ? __switch_to_asm+0x34/0x70
>> ? __switch_to_asm+0x40/0x70
>>
>> The code within the kretprobe handler checks for probe reentrancy,
>> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>>
>> The problem is in outside kprobe_flush_task, where we call:
>>
>> kprobe_flush_task
>> kretprobe_table_lock
>> raw_spin_lock_irqsave
>> _raw_spin_lock_irqsave
>>
>> where _raw_spin_lock_irqsave triggers the kretprobe and installs
>> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
>
> Hmm, OK. In this case, I think we should mark this process is
> going to die and never try to kretprobe on it.
>
>>
>> The kretprobe_trampoline handler is then executed with already
>> locked kretprobe_table_locks, and first thing it does is to
>> lock kretprobe_table_locks ;-) the whole lockup path like:
>>
>> kprobe_flush_task
>> kretprobe_table_lock
>> raw_spin_lock_irqsave
>> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>>
>> ---> kretprobe_table_locks locked
>>
>> kretprobe_trampoline
>> trampoline_handler
>> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
>>
>> The change below sets current_kprobe in kprobe_flush_task, so the probe
>> recursion protection check is hit and the probe is never set. It seems
>> to fix the deadlock.
>>
>> I'm not sure this is the best fix, any ideas are welcome ;-)
>
> Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> kprobes (and kretprobe) on an area by filling current_kprobe might
> be a good idea, but it also involves other kprobes.
Not sure how you mean that. Jiri's RFC patch would be disabling
k[ret]probes within kprobe_flush_task(), which is only ever invoked from
finish_task_switch(). I only see calls to spin locks and kfree() from
here. Besides, kprobe_flush_task() itself is NOKPROBE, so we would
ideally want to not trace/probe other functions it calls.
>
> How about let kretprobe skip the task which state == TASK_DEAD ?
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 627fc1b7011a..3f207d2e0afb 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> * To avoid deadlocks, prohibit return probing in NMI contexts,
> * just skip the probe and increase the (inexact) 'nmissed'
> * statistical counter, so that the user is informed that
> - * something happened:
> + * something happened.
> + * Also, if the current task is dead, we will already in the process
> + * to reclaim kretprobe instances from hash list. To avoid memory
> + * leak, skip to run the kretprobe on such task.
> */
> - if (unlikely(in_nmi())) {
> + if (unlikely(in_nmi()) || current->state == TASK_DEAD) {
I'm wondering if this actually works. kprobe_flush_task() seems to be
called from finish_task_switch(), after the task switch is complete. So,
current task won't actually be dead here.
- Naveen
Powered by blists - more mailing lists