linux-kernel - Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 17 Jul 2015 15:56:08 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	kaixu xia <xiakaixu@...wei.com>, davem@...emloft.net,
	acme@...nel.org, mingo@...hat.com, a.p.zijlstra@...llo.nl,
	masami.hiramatsu.pt@...achi.com, jolsa@...nel.org
Cc:	wangnan0@...wei.com, linux-kernel@...r.kernel.org,
	pi3orama@....com, hekuang@...wei.com
Subject: Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs
 to access hardware PMU counter

On 7/17/15 3:43 AM, kaixu xia wrote:
> There are many useful PMUs provided by X86 and other architectures. By
> combining PMU, kprobe and eBPF program together, many interesting things
> can be done. For example, by probing at sched:sched_switch we can
> measure IPC changing between different processes by watching 'cycle' PMU
> counter; by probing at entry and exit points of a kernel function we are
> able to compute cache miss rate for a function by collecting
> 'cache-misses' counter and see the differences. In summary, we can
> define the begin and end points of a procedure, insert kprobes on them,
> attach two BPF programs and let them collect specific PMU counter.

that would be definitely a useful feature.
As far as overall design I think it should be done slightly differently.
The addition of 'flags' to all maps is a bit hacky and it seems has few
holes. It's better to reuse 'store fds into maps' code that prog_array
is doing. You can add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY
and reuse most of the arraymap.c code.
The program also wouldn't need to do lookup+read_pmu, so instead of:
   r0 = 0 (the chosen key: CPU-0)
   *(u32 *)(fp - 4) = r0
   value = bpf_map_lookup_elem(map_fd, fp - 4);
   count = bpf_read_pmu(value);
you will be able to do:
   count = bpf_perf_event_read(perf_event_array_map_fd, index)
which will be faster.
note, I'd prefer 'bpf_perf_event_read' name for the helper.

Then inside helper we really cannot do mutex, sleep or smp_call,
but since programs are always executed in preempt disabled
and never from NMI, I think something like the following should work:
u64 bpf_perf_event_read(u64 r1, u64 index,...)
{
   struct bpf_perf_event_array *array = (void *) (long) r1;
   struct perf_event *event;

   if (unlikely(index >= array->map.max_entries))
      return -EINVAL;
   event = array->events[index];
   if (event->state != PERF_EVENT_STATE_ACTIVE)
      return -EINVAL;
   if (event->oncpu != raw_smp_processor_id())
      return -EINVAL;
   __perf_event_read(event);
   return perf_event_count(event);
}
not sure whether we need to disable irq around __perf_event_read,
I think it should be ok without.
Also during store of FD into perf_event_array you'd need
to filter out all crazy events. I would limit it to few
basic types first.

btw, make sure you do your tests with lockdep and other debugs on.
and for the sample code please use C for the bpf program. Not many
people can read bpf asm ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/