[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150212155856.GC21418@twins.programming.kicks-ass.net>
Date: Thu, 12 Feb 2015 16:58:56 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>
Cc: mingo@...nel.org, Michael Ellerman <mpe@...erman.id.au>,
Anton Blanchard <anton@....ibm.com>,
Stephane Eranian <eranian@...gle.com>,
Jiri Olsa <jolsa@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH] perf: Implement read_group() PMU operation
On Thu, Feb 05, 2015 at 06:59:15PM -0800, Sukadev Bhattiprolu wrote:
> From: Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>
> Date: Thu Feb 5 20:56:20 EST 2015 -0300
> Subject: [RFC][PATCH] perf: Implement read_group() PMU operation
>
> This is a lightly tested, exploratory patch to allow PMUs to return
> several counters at once. Appreciate any comments :-)
>
> Unlike normal hardware PMCs, the 24x7 counters[1] in Power8 are stored
> in memory and accessed via a hypervisor call (HCALL). A major aspect
> of the HCALL is that it allows retireving _SEVERAL_ counters at once
> (unlike regular PMCs, which are read one at a time).
>
> This patch implements a ->read_group() PMU operation that tries to
> take advantage of this ability to read several counters at once. A
> PMU that implements the ->read_group() operation would allow users
> to retrieve several counters at once and get a more consistent
> snapshot.
>
> NOTE: This patch has a TODO in h_24x7_event_read_group() in that it
> still does multiple HCALLS. I think that can be optimized
> independently, once the pmu->read_group() interface itself is
> finalized.
>
> Appreciate comments on the ->read_group interface and best managing the
> interfaces between the core and PMU layers - eg: Ok for hv-24x7 PMU to
> to walk the ->sibling_list ?
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3549,10 +3549,43 @@ static int perf_event_read_group(struct perf_event *event,
You also want perf_output_read_group().
> struct perf_event *leader = event->group_leader, *sub;
> int n = 0, size = 0, ret = -EFAULT;
> struct perf_event_context *ctx = leader->ctx;
> + u64 *valuesp;
> u64 values[5];
> + int use_group_read;
> u64 count, enabled, running;
> + struct pmu *pmu = event->pmu;
> +
> + /*
> + * If PMU supports group read and group read is requested,
> + * allocate memory before taking the mutex.
> + */
> + use_group_read = 0;
> + if ((read_format & PERF_FORMAT_GROUP) && pmu->read_group) {
> + use_group_read++;
> + }
> +
> + if (use_group_read) {
> + valuesp = kzalloc(leader->nr_siblings * sizeof(u64), GFP_KERNEL);
> + if (!valuesp)
> + return -ENOMEM;
> + }
This seems 'sad', the hardware already knows how many it can maximally
use at once and can preallocate, right?
>
> mutex_lock(&ctx->mutex);
> +
> + if (use_group_read) {
> + ret = pmu->read_group(leader, valuesp, leader->nr_siblings);
> + if (ret >= 0) {
> + size = ret * sizeof(u64);
> +
> + ret = size;
> + if (copy_to_user(buf, valuesp, size))
> + ret = -EFAULT;
> + }
> +
> + kfree(valuesp);
> + goto unlock;
> + }
> +
> count = perf_event_read_value(leader, &enabled, &running);
>
> values[n++] = 1 + leader->nr_siblings;
Since ->read() has a void return value, we can delay its effect, so I'm
currently thinking we might want to extend the transaction interface for
this; give pmu::start_txn() a flags argument to indicate scheduling
(add) or reading (read).
So we'd end up with something like:
pmu->start_txn(pmu, PMU_TXN_READ);
leader->read();
for_each_sibling()
sibling->read();
pmu->commit_txn();
after which we can use the values updated by the read calls. The trivial
no-support implementation lets read do its immediate thing like it does
now.
A more complex driver can then collect the actual counter values and
execute one hypercall using its pre-allocated memory.
So no allocations in the core code, and no sibling iterations in the
driver code.
Would that work for you?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists