[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1437129816-13176-1-git-send-email-xiakaixu@huawei.com>
Date: Fri, 17 Jul 2015 18:43:30 +0800
From: kaixu xia <xiakaixu@...wei.com>
To: <ast@...mgrid.com>, <davem@...emloft.net>, <acme@...nel.org>,
<mingo@...hat.com>, <a.p.zijlstra@...llo.nl>,
<masami.hiramatsu.pt@...achi.com>, <jolsa@...nel.org>
CC: <xiakaixu@...wei.com>, <wangnan0@...wei.com>,
<linux-kernel@...r.kernel.org>, <pi3orama@....com>,
<hekuang@...wei.com>
Subject: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
This series of patches introduce the new ability of eBPF programs
to access hardware PMU counter. Previous discussions on this subject:
https://lkml.org/lkml/2015/5/27/1027.
There are many useful PMUs provided by X86 and other architectures. By
combining PMU, kprobe and eBPF program together, many interesting things
can be done. For example, by probing at sched:sched_switch we can
measure IPC changing between different processes by watching 'cycle' PMU
counter; by probing at entry and exit points of a kernel function we are
able to compute cache miss rate for a function by collecting
'cache-misses' counter and see the differences. In summary, we can
define the begin and end points of a procedure, insert kprobes on them,
attach two BPF programs and let them collect specific PMU counter.
Further, by reading those PMU counter BPF program can bring some hints
to resource schedulers.
This patchset allows user read PMU events in the following way:
1. Open the PMU using perf_event_open() (for each CPUs or for
each processes he/she'd like to watch);
2. Create a BPF map with BPF_MAP_FLAG_PERF_EVENT set in its
type field;
3. Insert FDs into the map with some key-value mapping scheme
(i.e. cpuid -> event on that CPU);
4. Load and attach eBPF programs as usual;
5. In eBPF program, fetch the perf_event from map with key
(i.e. cpuid get from bpf_get_smp_processor_id()) then use
bpf_read_pmu() to read from it.
6. Do anything he/her want.
This patchset consists of necessary changes to the kernel space.
Perf will be the normal user space tool based on
https://lkml.org/lkml/2015/7/8/823 (perf tools: filtering events
using eBPF programs), https://lkml.org/lkml/2015/7/13/831
(Make eBPF programs output data to perf) and the corresonding
patches are on the way.
Patch 6/6 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace.(the cycles counter value when
'kprobe/sys_write' sampling)
$ ./bpf_pmu_test
$ cat /sys/kernel/debug/tracing/trace
...
syslog-ng-555 [001] dn.1 10189.004626: : bpf count: CPU-0 9935764297
syslog-ng-555 [001] d..1 10189.053776: : bpf count: CPU-0 10000706398
syslog-ng-555 [001] dn.1 10189.102972: : bpf count: CPU-0 10067117321
syslog-ng-555 [001] d..1 10189.152925: : bpf count: CPU-0 10134551505
syslog-ng-555 [001] dn.1 10189.202043: : bpf count: CPU-0 10200869299
syslog-ng-555 [001] d..1 10189.251167: : bpf count: CPU-0 10267179481
syslog-ng-555 [001] dn.1 10189.300285: : bpf count: CPU-0 10333493522
syslog-ng-555 [001] d..1 10189.349410: : bpf count: CPU-0 10399808073
syslog-ng-555 [001] dn.1 10189.398528: : bpf count: CPU-0 10466121583
syslog-ng-555 [001] d..1 10189.447645: : bpf count: CPU-0 10532433368
syslog-ng-555 [001] d..1 10189.496841: : bpf count: CPU-0 10598841104
syslog-ng-555 [001] d..1 10189.546891: : bpf count: CPU-0 10666410564
syslog-ng-555 [001] dn.1 10189.596016: : bpf count: CPU-0 10732729739
syslog-ng-555 [001] d..1 10189.645146: : bpf count: CPU-0 12884941186
syslog-ng-555 [001] d..1 10189.694263: : bpf count: CPU-0 12951249903
syslog-ng-555 [001] dn.1 10189.743382: : bpf count: CPU-0 13017561470
syslog-ng-555 [001] d..1 10189.792506: : bpf count: CPU-0 13083873521
syslog-ng-555 [001] d..1 10189.841631: : bpf count: CPU-0 13150190416
syslog-ng-555 [001] d..1 10189.890749: : bpf count: CPU-0 13216505962
syslog-ng-555 [001] d..1 10189.939945: : bpf count: CPU-0 13282913062
...
The detail of patches is as follow:
Patch 1/6 introduces a flag of map. The flag bit is encoded into type
field passed through attr;
Patch 2/6 introduces a map_traverse_elem() function for further use;
Patch 3/6 convets event file descriptors into perf_event structure when
add new element to a map with the flag set;
Patch 4/6 introduces a bpf program function argument constraint for
PMU map;
Patch 5/6 implement function bpf_read_pmu() that get the selected
hardware PMU conuter;
Patch 6/6 give a simple example.
kaixu xia (6):
bpf: Add new flags that specify the value type stored in map
bpf: Add function map->ops->map_traverse_elem() to traverse map elems
bpf: Save the pointer to struct perf_event to map
bpf: Add a bpf program function argument constraint for PMU map
bpf: Implement function bpf_read_pmu() that get the selected hardware
PMU conuter
samples/bpf: example of get selected PMU counter value
include/linux/bpf.h | 7 +++
include/linux/perf_event.h | 2 +
include/uapi/linux/bpf.h | 16 +++++
kernel/bpf/arraymap.c | 17 ++++++
kernel/bpf/hashtab.c | 27 +++++++++
kernel/bpf/helpers.c | 27 +++++++++
kernel/bpf/syscall.c | 81 ++++++++++++++++++++++++-
kernel/bpf/verifier.c | 9 +++
kernel/events/core.c | 22 +++++++
kernel/trace/bpf_trace.c | 2 +
samples/bpf/bpf_helpers.h | 2 +
samples/bpf/bpf_pmu_test.c | 143 ++++++++++++++++++++++++++++++++++++++++++++
12 files changed, 353 insertions(+), 2 deletions(-)
create mode 100644 samples/bpf/bpf_pmu_test.c
--
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists