linux-kernel - Re: [PATCH v2 2/5] perf: Free aux pages in unmap path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87fuvtyx37.fsf@ashishki-desk.ger.corp.intel.com>
Date:	Mon, 14 Mar 2016 16:04:44 +0200
From:	Alexander Shishkin <alexander.shishkin@...ux.intel.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
	vince@...ter.net, eranian@...gle.com,
	Arnaldo Carvalho de Melo <acme@...radead.org>
Subject: Re: [PATCH v2 2/5] perf: Free aux pages in unmap path

Peter Zijlstra <peterz@...radead.org> writes:

> On Fri, Mar 04, 2016 at 03:42:46PM +0200, Alexander Shishkin wrote:
>> @@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
>>  	 */
>>  	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
>>  	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
>> +		/*
>> +		 * Stop all aux events that are writing to this here buffer,
>> +		 * so that we can free its aux pages and corresponding pmu
>> +		 * data. Note that after rb::aux_mmap_count dropped to zero,
>> +		 * they won't start any more (see perf_aux_output_begin()).
>> +		 */
>> +		perf_pmu_output_stop(event);
>
> So to me it seems like we're interested in rb, we don't particularly
> care about @event in this case.

Yeah, @event is used only for its rb and pmu down this path.

>> +	if (!has_aux(event))
>> +		return;
>> +
>
> 	if (!parent)
> 		parent = event;
>
>> +	if (rcu_dereference(event->rb) == rb)
> s/event/parent/
>
>> +		ro->err = __perf_event_stop(event);
>
>> +	else if (parent && rcu_dereference(parent->rb) == rb)
>> +		ro->err = __perf_event_stop(event);
>
> and these can go.. However..

You're right: it's the parent that's got the ->rb, but it may well be
the child that's actually running and writing data there. So we need to
stop any running children. I think I actually broke this bit since the
previous version, because there needs to be a comment about it.

>
>> +}
>> +
>> +static int __perf_pmu_output_stop(void *info)
>> +{
>> +	struct perf_event *event = info;
>> +	struct pmu *pmu = event->pmu;
>> +	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
>> +	struct remote_output ro = {
>> +		.rb	= event->rb,
>> +	};
>> +
>> +	rcu_read_lock();
>> +	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
>> +	if (cpuctx->task_ctx)
>> +		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
>> +				   &ro);
>> +	rcu_read_unlock();
>> +
>> +	return ro.err;
>> +}
>> +
>> +static void perf_pmu_output_stop(struct perf_event *event)
>> +{
>> +	int cpu, err;
>> +
>> +	/* better be thorough */
>> +	get_online_cpus();
>> +restart:
>> +	for_each_online_cpu(cpu) {
>> +		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
>> +		if (err)
>> +			goto restart;
>> +	}
>> +	put_online_cpus();
>> +}
>
> This seems wildly overkill, could we not iterate rb->event_list like we
> do for the normal buffer?

Actually we can. One problem though is that iterating rb::event_list
requires rcu read section or irqsafe rb::event_lock and we need to send
IPIs. The normal buffer case tears down the rb::event_list as it goes,
so it can close the rcu read section right after it fetches one event
from it. In this case however, we must keep the list intact.

> Sure, we need to IPI for each event found, but that seems better than
> unconditionally sending IPIs to all CPUs.

Actually, won't it "often" be the case that the number of events will be
a multiple of the number of cpus? The usual use case being one event per
task per cpu and inheritance enabled. In this case we'll zap multiple
events per IPI.

Regards,
--
Alex