linux-kernel - Re: [PATCH v9 12/14] x86: perf: intel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 30 Jan 2015 11:48:59 +0200
From:	Alexander Shishkin <alexander.shishkin@...ux.intel.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Ingo Molnar <mingo@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Robert Richter <rric@...nel.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Mike Galbraith <efault@....de>,
	Paul Mackerras <paulus@...ba.org>,
	Stephane Eranian <eranian@...gle.com>,
	Andi Kleen <ak@...ux.intel.com>, kan.liang@...el.com,
	adrian.hunter@...el.com, markus.t.metzger@...el.com,
	Mathieu Poirier <mathieu.poirier@...aro.org>,
	Kaixu Xia <kaixu.xia@...aro.org>, acme@...radead.org
Subject: Re: [PATCH v9 12/14] x86: perf: intel_pt: Intel PT PMU driver

On 29 January 2015 at 17:20, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Jan 29, 2015 at 05:03:21PM +0200, Alexander Shishkin wrote:
>> > We're already holding ctx->mutex, this should have made lockdep scream.
>>
>> As I mentioned offline, cpuctx->ctx.mutex is set to a lockdep class of
>> its own, so lockdep doesn't see this. It is, of course, still a
>> problem.
>
> Right; I don't think we currently use that annotation of
> cpuctx->ctx.mutex but I had indeed overlooked that.
>
> Per perf_ctx_lock() the nesting order is:
>
>  cpuctx
>    ctx
>
> And here we would have done the inverse. Still I'm not entirely sure it
> would've resulted in a deadlock, we typically don't take multiple
> ctx->mutex locks, and where we do its either between different context
> on the same CPU (eg, moving a software event to the hardware lists), or
> between the same context on different CPUs (perf_pmu_migrate_context),
> or between inherited contexts (put_event, in child->parent nesting).
>
> None of those cases seem to overlap with this order.

Indeed.

> However,
>
>> But as you pointed out, if we grab the exclusive_cnt for per-task
>> cpu!=-1 events as well, we don't need to look into other contexts here
>> at all.
>
> its irrelevant because we can avoid the entire ordeal ;-)

Exactly. :)

>> > --- a/kernel/events/core.c
>> > +++ b/kernel/events/core.c
>> > @@ -3487,6 +3487,9 @@ static void __free_event(struct perf_eve
>> >         if (event->destroy)
>> >                 event->destroy(event);
>> >
>> > +       if (event->pmu && event->ctx)
>> > +               exclusive_event_release(event);
>>
>> It looks like event can be already removed from its context at this
>> point, so event->ctx will be NULL and the counter would leak or am I
>> missing something?
>
> Yeah, event->ctx is _magic_ ;-)
>
> event->ctx is never NULL, although it can change. Much fun because of
> that; see: lkml.kernel.org/r/20150123125159.696530128@...radead.org if
> you like to torture yourself a wee bit.

Yeah, I already did, having noticed mutex_lock_double() in the context
of your diff. And I'll probably re-read it a few more times. :)

>> I used the event->attach_state & PERF_ATTACH_TASK to see if it's a
>> per-task counter, which seems reliable even though it might need
>> documenting.
>
> Yes, I suppose that is indeed better, I had feared we clear the
> ATTACH_TASK state on remove_from_context() like we do all the other
> ATTACH states, but we do not.

Yep. It probably needs documenting though, in case somebody decides to
change this in future.

>> > +static bool exclusive_event_ok(struct perf_event *event,
>> > +                              struct perf_event_context *ctx)
>>
>> Then, maybe exclusive_event_installable() is a better name, because
>> this one only deals with the current context; cpu-wide vs per-task
>> case is taken care of in perf_event_alloc()/__free_event().
>
> OK.
>
>> > +       /*
>> > +        * exclusive_cnt <0: cpu
>> > +        *               >0: tsk
>> > +        */
>> > +       if (ctx->task) {
>> > +               if (!atomic_inc_unless_negative(&pmu->exclusive_cnt))
>> > +                       return false;
>> > +       } else {
>> > +               if (!atomic_dec_unless_positive(&pmu->exclusive_cnt))
>> > +                       return false;
>> > +       }
>>
>> So I would like to keep this bit in perf_event_alloc() path and the
>> reverse in __free_event(),
>
> Fair enough; exclusive_event_init() ?

Ok.

>
>> > +
>> > +       mutex_lock(&ctx->lock);
>> > +       ret = __exclusive_event_ok(event, ctx);
>> > +       mutex_unlock(&ctx->lock);
>
> And as you pointed out on IRC, this needs to be in the same section as
> install_in_context() otherwise we have a race.
>
>> > +       if (!ret) {
>> > +               if (ctx->task)
>> > +                       atomic_dec(&pmu->exclusive_cnt);
>> > +               else
>> > +                       atomic_inc(&pmu->exclusive_cnt);
>> > +       }
>>
>> in which case we don't need to undo the counter here, because it will
>> still go through __free_event() in the error path.
>
> OK; exclusive_event_destroy() ?

Yep.

Cheers,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/