linux-kernel - Re: Perf hotplug lockup in v4.9-rc8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170111162609.GE26344@leverpostej>
Date:   Wed, 11 Jan 2017 16:26:09 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        jeremy.linton@....com, Will Deacon <will.deacon@....com>
Subject: Re: Perf hotplug lockup in v4.9-rc8

On Wed, Jan 11, 2017 at 05:03:58PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 11, 2017 at 02:59:20PM +0000, Mark Rutland wrote:
> > On Fri, Dec 09, 2016 at 02:59:00PM +0100, Peter Zijlstra wrote:

> > > +	 * If we get a false negative, things are complicated. If we are after
> > > +	 * perf_event_context_sched_in() ctx::lock will serialize us, and the
> > > +	 * value must be correct. If we're before, it doesn't matter since
> > > +	 * perf_event_context_sched_in() will program the counter.
> > > +	 *
> > > +	 * However, this hinges on the remote context switch having observed
> > > +	 * our task->perf_event_ctxp[] store, such that it will in fact take
> > > +	 * ctx::lock in perf_event_context_sched_in().
> > 
> > Sorry if I'm being thick here, but which store are we describing above?
> > i.e. which function, how does that relate to perf_install_in_context()?
> 
> The only store to perf_event_ctxp[] of interest is the initial one in
> find_get_context().

Ah, I see. I'd missed the rcu_assign_pointer() when looking around for
an assignment.

> > I haven't managed to wrap my head around why this matters. :/
> 
> See the scenario from:
> 
>  https://lkml.kernel.org/r/20161212124228.GE3124@twins.programming.kicks-ass.net
> 
> Its installing the first event on 't', which concurrently with the
> install gets migrated to a third CPU.

I was completely failing to consider that this was the installation of
the first event; I should have read the existing comment. Things make a
lot more sense now.

> CPU0            CPU1            CPU2
> 
>                 (current == t)
> 
> t->perf_event_ctxp[] = ctx;
> smp_mb();
> cpu = task_cpu(t);
> 
>                 switch(t, n);
>                                 migrate(t, 2);
>                                 switch(p, t);
> 
>                                 ctx = t->perf_event_ctxp[]; // must not be NULL
> 
> smp_function_call(cpu, ..);
> 
>                 generic_exec_single()
>                   func();
>                     spin_lock(ctx->lock);
>                     if (task_curr(t)) // false
> 
>                     add_event_to_ctx();
>                     spin_unlock(ctx->lock);
> 
>                                 perf_event_context_sched_in();
>                                   spin_lock(ctx->lock);
>                                   // sees event
> 
> 
> 
> So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing.
> Because if CPU2's load of that variable were to observe NULL, it would
> not try to schedule the ctx and we'd have a task running without its
> counter, which would be 'bad'.
> 
> As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
> first and not see the event yet, then CPU0 must observe task_running()
> and retry. If the install happens first, then we must see the event on
> sched-in and all is well.

I think I follow now. Thanks for bearing with me!

> In any case, I'll try and write a proper Changelog for this...

If it's just the commit message and/or comments changing, feel free to
add:

Tested-by: Mark Rutland <mark.rutland@....com>

Thanks,
Mark.