lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5600C7F7.70702@huawei.com>
Date:	Tue, 22 Sep 2015 11:16:07 +0800
From:	Yunlong Song <yunlong.song@...wei.com>
To:	<a.p.zijlstra@...llo.nl>, <paulus@...ba.org>, <mingo@...hat.com>,
	<acme@...nel.org>, <rostedt@...dmis.org>, <ast@...nel.org>,
	<jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
	<masami.hiramatsu.pt@...achi.com>, <adrian.hunter@...el.com>,
	David Ahern <dsahern@...il.com>, <bp@...en8.de>,
	<rric@...nel.org>
CC:	<linux-kernel@...r.kernel.org>, <wangnan0@...wei.com>
Subject: [RFC resend] Perf: Trigger and dump sample info to perf.data from
 user space ring buffer

[Problem Background]

We want to run perf in daemon mode and collect the traces when the exception
(e.g., machine crashes, app performance goes down) appears. Perf may run for a
long time (from days to weeks or even months), since we do not know when the
exception will appear at all, however it will appear at some time (especially
for a beta product). If we simply use “perf record” as usual, here come two
problems as time goes by: 1 there will be large amounts of IOs created for writing
perf.data which may affects the performance a lot; 2 the size of perf.data will
be larger and larger as well. Although we can use eBPF to reduce the traces in
normal case, but in our case, the perf runs in daemon mode for a long time and
that will accumulate the traces as time goes by.


[One Solution]

In fact, we only need to collect the sample info which are created for a while
just before the exception appears. We do not care about the other sample info in
other time. So perhaps we have to change the current way how perf makes its
perf.data as follows:
 1 Let perf allocate a user space ring buffer in a reasonable size, which is big
   enough to store all the tracing info we care about (for a while) before the
   exception appears;
 2 Dump the sample info to the user space ring buffer, the size of user space
   ring buffer is a constant value, so the newer sample info will replace the older
   sample info;
 3 After some kind of trigger (maybe via eBPF event, signal or socket
   communication) which is caused by the exception situation, the user space ring
   buffer should dump all its tracing info to perf.data.sample.TIME#


[Use Style]
	
We can add an option (such as “-M size” or “--memory size”) to define the
size of the user space ring buffer and active the user space ring buffer mode
described above. For convenience, we can add “--daemon” to make perf run as a
daemon.
# perf record -M size -e ebpf.o -e cycles -g -F 100 -a sleep 1000000 &
Or
# perf record -M size -e ebpf.o -e cycles -g -F 100 -a --daemon

When the exception appears, it sends a signal (may also use eBPF event or socket
communication) to perf
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1

When the 2nd exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2

......

When the Nth exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 … perf.data.sample.TIMEN

We can user perf report or perf script to analyze each perf.data.sample.TIME#

Or finally, we can kill perf and combine perf.data.auxiliary with all the
perf.data.sample.TIME# to create all-in-one perf.data
# kill --SIGUSR2 1234
# ls
perf.data


[To Do]

If the idea mentioned above is OK, we want to implement it in the following steps:
 1 Develop perf’s user space ring buffer, which can make newer sample info replace
   older sample info.
 2 Classify the tracing info into two kinds, one kind is just sample event, and we
   only need some of them which are created (for a while) just before the exception
   appears, we can call the first kind of tracing info as Optional tracing info,
   and perf should dump this info to the user space ring buffer instead of perf.data;
   the second kind is the tracing info which are required to analyze the sample events,
   such as mmap_event to show the dso's related info, we can call this second kind of
   tracing info as Auxiliary tracing info, and perf should dump this info into
   perf.data.auxiliary or just directly into perf.data as before.
 3 Develop a trigger for perf, which can activate perf to dump its user space ring
   buffer to perf.data.sample.TIME#, or just append them into perf.data. The trigger
   may include three interfaces, eBPF event, signal and socket communication.
 4 Make perf report or perf script etc, have the ability to analyze the
   perf.data.auxiliary, perf.data.sample.TIME#, or the final synthetic perf.data
   combined from perf.data.auxiliary and all the perf.data.sample.TIME#
 5 For daemon mode, we should also let perf support its running in backend all
   the time and its ending from a trigger.


[Conclusion]

In fact, we realize a mechanism to make perf's tracing more refined and more
efficient. We regard the size of perf.data and the cost of writing perf.data as
an expensive resource, which has batter to be used in a more careful and
just-for-the-exception target way. This mechanism can be used both in daemon mode
or in non-daemon mode. This idea can be another way to filter the tracing events
compared to eBPF from different view.


-- 
Thanks,
Yunlong Song

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ