[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181017121140.GA31465@kernel.org>
Date: Wed, 17 Oct 2018 09:11:40 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Song Liu <liu.song.a23@...il.com>
Cc: David Ahern <dsahern@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Alexei Starovoitov <ast@...nel.org>,
Alexey Budankov <alexey.budankov@...ux.intel.com>,
"David S . Miller" <davem@...emloft.net>,
Daniel Borkmann <daniel@...earbox.net>,
Namhyung Kim <namhyung@...nel.org>,
Jiri Olsa <jolsa@...nel.org>,
Networking <netdev@...r.kernel.org>, kernel-team@...com
Subject: Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog
load/unload
Adding Alexey, Jiri and Namhyung as they worked/are working on
multithreading 'perf record'.
Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> On Tue, Oct 16, 2018 at 4:43 PM David Ahern <dsahern@...il.com> wrote:
> > On 10/15/18 4:33 PM, Song Liu wrote:
> > > I am working with Alexei on the idea of fetching BPF program information via
> > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > to perf_event_type, and dumped these events to perf event ring buffer.
> > > I found that perf will not process event until the end of perf-record:
> > > root@...t-test:~# ~/perf record -ag -- sleep 10
> > > ...... 10 seconds later
> > > [ perf record: Woken up 34 times to write data ]
> > > machine__process_bpf_event: prog_id 6 loaded
> > > machine__process_bpf_event: prog_id 6 unloaded
> > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]
> > > In this example, the bpf program was loaded and then unloaded in
> > > another terminal. When machine__process_bpf_event() processes
> > > the load event, the bpf program is already unloaded. Therefore,
> > > machine__process_bpf_event() will not be able to get information
> > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.
> > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > as soon as perf get the event from kernel. I looked around the perf
> > > code for a while. But I haven't found a good example where some
> > > events are processed before the end of perf-record. Could you
> > > please help me with this?
> > perf record does not process events as they are generated. Its sole job
> > is pushing data from the maps to a file as fast as possible meaning in
> > bulk based on current read and write locations.
> > Adding code to process events will add significant overhead to the
> > record command and will not really solve your race problem.
> I agree that processing events while recording has significant overhead.
> In this case, perf user space need to know details about the the jited BPF
> program. It is impossible to pass all these details to user space through
> the relatively stable ring_buffer API. Therefore, some processing of the
> data is necessary (get bpf prog_id from ring buffer, and then fetch program
> details via BPF_OBJ_GET_INFO_BY_FD.
> I have some idea on processing important data with relatively low overhead.
> Let me try implement it.
Well, you could have a separate thread processing just those kinds of
events, associate it with a dummy event where you only ask for
PERF_RECORD_BPF_EVENTs.
Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
perf_event_attr:
[root@...enth ~]# perf record -vv -e dummy sleep 01
------------------------------------------------------------
perf_event_attr:
type 1
size 112
config 0x9
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 12046 cpu 0 group_fd -1 flags 0x8 = 4
sys_perf_event_open: pid 12046 cpu 1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 12046 cpu 2 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 12046 cpu 3 group_fd -1 flags 0x8 = 8
mmap size 528384B
perf event ring buffer mmapped per cpu
Synthesizing TSC conversion information
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
[root@...enth ~]#
[root@...enth ~]# perf evlist -v
dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
[root@...enth ~]#
There is work ongoing in dumping one file per cpu and then, at post
processing time merging all those files to get ordering, so one more
file, for these VIP events, that require per-event processing would be
ordered at that time with all the other per-cpu files.
- Arnaldo
Powered by blists - more mailing lists