[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55A98808.9010307@plumgrid.com>
Date: Fri, 17 Jul 2015 15:56:08 -0700
From: Alexei Starovoitov <ast@...mgrid.com>
To: kaixu xia <xiakaixu@...wei.com>, davem@...emloft.net,
acme@...nel.org, mingo@...hat.com, a.p.zijlstra@...llo.nl,
masami.hiramatsu.pt@...achi.com, jolsa@...nel.org
Cc: wangnan0@...wei.com, linux-kernel@...r.kernel.org,
pi3orama@....com, hekuang@...wei.com
Subject: Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs
to access hardware PMU counter
On 7/17/15 3:43 AM, kaixu xia wrote:
> There are many useful PMUs provided by X86 and other architectures. By
> combining PMU, kprobe and eBPF program together, many interesting things
> can be done. For example, by probing at sched:sched_switch we can
> measure IPC changing between different processes by watching 'cycle' PMU
> counter; by probing at entry and exit points of a kernel function we are
> able to compute cache miss rate for a function by collecting
> 'cache-misses' counter and see the differences. In summary, we can
> define the begin and end points of a procedure, insert kprobes on them,
> attach two BPF programs and let them collect specific PMU counter.
that would be definitely a useful feature.
As far as overall design I think it should be done slightly differently.
The addition of 'flags' to all maps is a bit hacky and it seems has few
holes. It's better to reuse 'store fds into maps' code that prog_array
is doing. You can add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY
and reuse most of the arraymap.c code.
The program also wouldn't need to do lookup+read_pmu, so instead of:
r0 = 0 (the chosen key: CPU-0)
*(u32 *)(fp - 4) = r0
value = bpf_map_lookup_elem(map_fd, fp - 4);
count = bpf_read_pmu(value);
you will be able to do:
count = bpf_perf_event_read(perf_event_array_map_fd, index)
which will be faster.
note, I'd prefer 'bpf_perf_event_read' name for the helper.
Then inside helper we really cannot do mutex, sleep or smp_call,
but since programs are always executed in preempt disabled
and never from NMI, I think something like the following should work:
u64 bpf_perf_event_read(u64 r1, u64 index,...)
{
struct bpf_perf_event_array *array = (void *) (long) r1;
struct perf_event *event;
if (unlikely(index >= array->map.max_entries))
return -EINVAL;
event = array->events[index];
if (event->state != PERF_EVENT_STATE_ACTIVE)
return -EINVAL;
if (event->oncpu != raw_smp_processor_id())
return -EINVAL;
__perf_event_read(event);
return perf_event_count(event);
}
not sure whether we need to disable irq around __perf_event_read,
I think it should be ok without.
Also during store of FD into perf_event_array you'd need
to filter out all crazy events. I would limit it to few
basic types first.
btw, make sure you do your tests with lockdep and other debugs on.
and for the sample code please use C for the bpf program. Not many
people can read bpf asm ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists