linux-kernel - Re: Perf hotplug lockup in v4.9-rc8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161207183455.GQ3124@twins.programming.kicks-ass.net>
Date:   Wed, 7 Dec 2016 19:34:55 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        jeremy.linton@....com
Subject: Re: Perf hotplug lockup in v4.9-rc8

On Wed, Dec 07, 2016 at 05:53:47PM +0000, Mark Rutland wrote:
> On Wed, Dec 07, 2016 at 01:52:17PM +0000, Mark Rutland wrote:
> > Hi all
> > 
> > Jeremy noticed a kernel lockup on arm64 when the perf tool was used in
> > parallel with hotplug, which I've reproduced on arm64 and x86(-64) with
> > v4.9-rc8. In both cases I'm using defconfig; I've tried enabling lockdep
> > but it was silent for arm64 and x86.
> 
> It looks like we're trying to install a task-bound event into a context
> where task_cpu(ctx->task) is dead, and thus the cpu_function_call() in
> perf_install_in_context() fails. We retry repeatedly.
> 
> On !PREEMPT (as with x86 defconfig), we manage to prevent the hotplug
> machinery from making progress, and this turns into a livelock.
> 
> On PREEMPT (as with arm64 defconfig), I'm somewhat lost.

So the problem is that even with PREEMPT we can hit a blocked task
that has a 'dead' cpu.

We'll spin until either the task wakes up or the CPU does, either can
take a very long time.

How exactly your test-case triggers this, all it executes is 'true' and
that really shouldn't block much, is a mystery still.

In any case, the below cures things, but it is, as the comment says,
horrific.

I'm yet to come up with a better patch that doesn't violate scheduler
internals.

---
 kernel/events/core.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6ee1febdf6ff..6faf4b03396e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1203,6 +1203,11 @@ static void put_ctx(struct perf_event_context *ctx)
  *	      perf_event_context::lock
  *	    perf_event::mmap_mutex
  *	    mmap_sem
+ *
+ *    task_struct::pi_lock
+ *      rq::lock
+ *        perf_event_context::lock
+ *
  */
 static struct perf_event_context *
 perf_event_ctx_lock_nested(struct perf_event *event, int nesting)
@@ -2352,6 +2357,28 @@ perf_install_in_context(struct perf_event_context *ctx,
 		return;
 	}
 	raw_spin_unlock_irq(&ctx->lock);
+
+	raw_spin_lock_irq(&task->pi_lock);
+	if (!(task->state == TASK_RUNNING || task->state == TASK_WAKING)) {
+		/*
+		 * XXX horrific hack...
+		 */
+		raw_spin_lock(&ctx->lock);
+		if (task != ctx->task) {
+			raw_spin_unlock(&ctx->lock);
+			raw_spin_unlock_irq(&task->pi_lock);
+			goto again;
+		}
+
+		add_event_to_ctx(event, ctx);
+		raw_spin_unlock(&ctx->lock);
+		raw_spin_unlock_irq(&task->pi_lock);
+		return;
+	}
+	raw_spin_unlock_irq(&task->pi_lock);
+
+	cond_resched();
+
 	/*
 	 * Since !ctx->is_active doesn't mean anything, we must IPI
 	 * unconditionally.