linux-kernel - Re: [RFC 2/2] perf: Sharing PMU counters across compatible events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <0EA2C468-0D02-45A7-AED0-4298E8BC5D87@fb.com>
Date:   Mon, 28 May 2018 18:24:09 +0000
From:   Song Liu <songliubraving@...com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "tj@...nel.org" <tj@...nel.org>,
        "jolsa@...nel.org" <jolsa@...nel.org>
Subject: Re: [RFC 2/2] perf: Sharing PMU counters across compatible events



> On May 28, 2018, at 4:15 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Fri, May 04, 2018 at 04:11:02PM -0700, Song Liu wrote:
>> Connection among perf_event and perf_event_dup are built with function
>> rebuild_event_dup_list(cpuctx). This function is only called when events
>> are added/removed or when a task is scheduled in/out. So it is not on
>> critical path of perf_rotate_context().
> 
> Why is perf_rotate_context() the only critical path? I would say the
> context switch path is rather critical too.
> 
>> @@ -2919,8 +3014,10 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>> 
>> 	if (ctx->task) {
>> 		WARN_ON_ONCE(cpuctx->task_ctx != ctx);
>> -		if (!ctx->is_active)
>> +		if (!ctx->is_active) {
>> 			cpuctx->task_ctx = NULL;
>> +			rebuild_event_dup_list(cpuctx);
>> +		}
>> 	}
>> 
>> 	/*
> 
>> +static void rebuild_event_dup_list(struct perf_cpu_context *cpuctx)
>> +{
>> +	int dup_count = cpuctx->ctx.nr_events;
>> +	struct perf_event_context *ctx = cpuctx->task_ctx;
>> +	struct sched_in_data sid = {
>> +		.ctx = ctx,
>> +		.cpuctx = cpuctx,
>> +		.can_add_hw = 1,
>> +	};
>> +
>> +	if (ctx)
>> +		dup_count += ctx->nr_events;
>> +
>> +	kfree(cpuctx->dup_event_list);
>> +	cpuctx->dup_event_count = 0;
>> +
>> +	cpuctx->dup_event_list =
>> +		kzalloc(sizeof(struct perf_event_dup) * dup_count, GFP_ATOMIC);
> 
> 
> __schedule()
>  local_irq_disable()
>  raw_spin_lock(rq->lock)
>  context_switch()
>    prepare_task_switch()
>      perf_event_task_sched_out()
>        __perf_event_task_sched_out()
> 	  perf_event_context_sched_out()
> 	    task_ctx_sched_out()
> 	      ctx_sched_out()
> 	        rebuild_event_dup_list()
> 		  kzalloc()
> 		    ...
> 		      spin_lock()
> 
> Also, as per the above, this nests a regular spin lock inside the
> (raw) rq->lock, which is a no-no.
> 
> Not to mention that whole O(n) crud in the scheduling path...

I think we can also fix the scheduling path. To achieve this, we need
to limit the sharing within the ctx. In other words, events in 
cpuctx->ctx can only share PMU with events in cpuctx->ctx, but not 
with events in cpuctx->task_ctx. This will probably also solve the
locking issue here. Let me try it. 

Thanks,
Song