lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150217083312.GA16221@us.ibm.com>
Date:	Tue, 17 Feb 2015 00:33:12 -0800
From:	Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	mingo@...nel.org, Michael Ellerman <mpe@...erman.id.au>,
	Anton Blanchard <anton@....ibm.com>,
	Stephane Eranian <eranian@...gle.com>,
	Jiri Olsa <jolsa@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH] perf: Implement read_group() PMU operation

Peter Zijlstra [peterz@...radead.org] wrote:
| > --- a/kernel/events/core.c
| > +++ b/kernel/events/core.c
| > @@ -3549,10 +3549,43 @@ static int perf_event_read_group(struct perf_event *event,
| 
| You also want perf_output_read_group().

Ok. Will look into it. We currently don't support sampling with
the 24x7 counters but we should make sure that the new interface
fits correctly.


| 
| >  	struct perf_event *leader = event->group_leader, *sub;
| >  	int n = 0, size = 0, ret = -EFAULT;
| >  	struct perf_event_context *ctx = leader->ctx;
| > +	u64 *valuesp;
| >  	u64 values[5];
| > +	int use_group_read;
| >  	u64 count, enabled, running;
| > +	struct pmu *pmu = event->pmu;
| > +
| > +	/*
| > +	 * If PMU supports group read and group read is requested,
| > +	 * allocate memory before taking the mutex.
| > +	 */
| > +	use_group_read = 0;
| > +	if ((read_format & PERF_FORMAT_GROUP) && pmu->read_group) {
| > +		use_group_read++;
| > +	}
| > +
| > +	if (use_group_read) {
| > +		valuesp = kzalloc(leader->nr_siblings * sizeof(u64), GFP_KERNEL);
| > +		if (!valuesp)
| > +			return -ENOMEM;
| > +	}
| 
| This seems 'sad', the hardware already knows how many it can maximally
| use at once and can preallocate, right?

Yeah :-) In a subsequent version, I got rid of the allocation by moving
the copy_to_user() into the PMU, but still needed to walk the sibling list.
| 
| >  
| >  	mutex_lock(&ctx->mutex);
| > +
| > +	if (use_group_read) {
| > +		ret = pmu->read_group(leader, valuesp, leader->nr_siblings);
| > +		if (ret >= 0) {
| > +			size = ret * sizeof(u64);
| > +
| > +			ret = size;
| > +			if (copy_to_user(buf, valuesp, size))
| > +				ret = -EFAULT;
| > +		}
| > +
| > +		kfree(valuesp);
| > +		goto unlock;
| > +	}
| > +
| >  	count = perf_event_read_value(leader, &enabled, &running);
| >  
| >  	values[n++] = 1 + leader->nr_siblings;
| 
| Since ->read() has a void return value, we can delay its effect, so I'm
| currently thinking we might want to extend the transaction interface for
| this; give pmu::start_txn() a flags argument to indicate scheduling
| (add) or reading (read).
| 
| So we'd end up with something like:
| 
| 	pmu->start_txn(pmu, PMU_TXN_READ);
| 
| 	leader->read();
| 
| 	for_each_sibling()
| 		sibling->read();
| 
| 	pmu->commit_txn();

So, each of the ->read() calls is really "appending a counter" to a
list of counters that the PMU should read and the values for the counters
(results of the read) are only available after the commit_txn() right? 

In which case, perf_event_read_group() would then follow this commit_txn()
with its "normal" read,  and the PMU would return the result cached during
->commit_txn(). If so, we need a way to invalidate the cached result ?

| 
| after which we can use the values updated by the read calls. The trivial
| no-support implementation lets read do its immediate thing like it does
| now.
| 
| A more complex driver can then collect the actual counter values and
| execute one hypercall using its pre-allocated memory.

the hypercall should happen  in the ->commit_txn() right ?

| 
| So no allocations in the core code, and no sibling iterations in the
| driver code.
| 
| Would that work for you?

I think it would, 

I am working on breaking up the underlying code along start/read/commit
lines, and hope to have it done later this week and then can experiment
more with this interface.

Appreciate the input.

Sukadev

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ