lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161207195643.GA9027@leverpostej>
Date:   Wed, 7 Dec 2016 19:56:44 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        jeremy.linton@....com
Subject: Re: Perf hotplug lockup in v4.9-rc8

On Wed, Dec 07, 2016 at 07:34:55PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 07, 2016 at 05:53:47PM +0000, Mark Rutland wrote:
> > On Wed, Dec 07, 2016 at 01:52:17PM +0000, Mark Rutland wrote:
> > > Hi all
> > > 
> > > Jeremy noticed a kernel lockup on arm64 when the perf tool was used in
> > > parallel with hotplug, which I've reproduced on arm64 and x86(-64) with
> > > v4.9-rc8. In both cases I'm using defconfig; I've tried enabling lockdep
> > > but it was silent for arm64 and x86.
> > 
> > It looks like we're trying to install a task-bound event into a context
> > where task_cpu(ctx->task) is dead, and thus the cpu_function_call() in
> > perf_install_in_context() fails. We retry repeatedly.
> > 
> > On !PREEMPT (as with x86 defconfig), we manage to prevent the hotplug
> > machinery from making progress, and this turns into a livelock.
> > 
> > On PREEMPT (as with arm64 defconfig), I'm somewhat lost.
> 
> So the problem is that even with PREEMPT we can hit a blocked task
> that has a 'dead' cpu.
> 
> We'll spin until either the task wakes up or the CPU does, either can
> take a very long time.
> 
> How exactly your test-case triggers this, all it executes is 'true' and
> that really shouldn't block much, is a mystery still.

The perf tool forks a helper process, which blocks on a pipe, and once
signalled, execs the target (i.e. true). The main perf process opens
(enable-on-exec) events on that, then writes to the pipe to wake up the
helper.

... so now I see why that makes us see a dead task_cpu(); thanks for the
explanation above!

[...]

> @@ -2352,6 +2357,28 @@ perf_install_in_context(struct perf_event_context *ctx,
>  		return;
>  	}
>  	raw_spin_unlock_irq(&ctx->lock);
> +
> +	raw_spin_lock_irq(&task->pi_lock);
> +	if (!(task->state == TASK_RUNNING || task->state == TASK_WAKING)) {

For a moment I thought there was a remaining race here with the lazy
ctx-switch if the new task was RUNNING on an online CPU, but I guess
we'll retry the cpu_function_call() in that case.

I'll attack this tomorrow when I can think again...

Thanks,
Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ