linux-kernel - Re: perf bpf examples

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <577F8473.1090208@huawei.com>
Date:	Fri, 8 Jul 2016 18:46:11 +0800
From:	"Wangnan (F)" <wangnan0@...wei.com>
To:	Brendan Gregg <brendan.d.gregg@...il.com>
CC:	"linux-perf-use." <linux-perf-users@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Arnaldo Carvalho de Melo" <acme@...nel.org>,
	Alexei Starovoitov <ast@...nel.org>
Subject: Re: perf bpf examples



On 2016/7/8 15:57, Brendan Gregg wrote:
> On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F) <wangnan0@...wei.com> wrote:
>>
>> On 2016/7/8 1:58, Brendan Gregg wrote:
>>> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
>>> <brendan.d.gregg@...il.com> wrote:
>>>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangnan0@...wei.com> wrote:
> [...]
>>> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet?
>>> Thanks,
>>
>> Theoretically, BPF program is an additional filter to
>> decide whetier an event should be filtered out or pass to perf. -F 99
>> is another filter, which drops samples to ensure the frequence.
>> Filters works together. The full graph should be:
>>
>>   BPF --> traditional filter --> proc (system wide of proc specific) -->
>> period
>>
>> See the example at the end of this mail. The BPF program returns 0 for half
>> of
>> the events, and the result should be symmetrical. We can get similar result
>> without
>> -F:
>>
>> # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero
>> of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
>> [ perf record: Woken up 28 times to write data ]
>> [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
>> #
>> root@...Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
>> ./sampling.c dd if=/dev/zero of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
>> [ perf record: Woken up 54 times to write data ]
>> [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]
>>
>>
>> With -F99 added:
>>
>> # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
>> if=/dev/zero of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
>> # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
>> if=/dev/zero of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]
> That looks like it's doing two different things: -F99, and a
> sampling.c script (SEC("func=sys_read")).
>
> I mean just an -F99 that executes a BPF program on each sample. My
> most common use for perf is:
>
> perf record -F 99 -a -g -- sleep 30
> perf report (or perf script, for making flame graphs)
>
> But this uses perf.data as an intermediate file. With the recent
> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
> kernel context, and just dump a report. Much more efficient. And
> improving a very common perf one-liner.

You can't attach BPF script to samples other than kprobe and tracepoints.
When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on
'cycles:ppp' event. This is a hardware PMU event.

If we find a kprobe or tracepoint event which would be triggered 99 times
in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and 
bpf_get_stackid().

Thank you.