[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW6F=Ct7k1Z7G0jX_U6QkJrH5D2tz38nR+XXWjy45SjaOg@mail.gmail.com>
Date: Wed, 17 Oct 2018 09:06:09 -0700
From: Song Liu <liu.song.a23@...il.com>
To: arnaldo.melo@...il.com
Cc: David Ahern <dsahern@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Alexei Starovoitov <ast@...nel.org>,
Alexey Budankov <alexey.budankov@...ux.intel.com>,
"David S . Miller" <davem@...emloft.net>,
Daniel Borkmann <daniel@...earbox.net>, namhyung@...nel.org,
Jiri Olsa <jolsa@...nel.org>,
Networking <netdev@...r.kernel.org>, kernel-team@...com
Subject: Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload
On Wed, Oct 17, 2018 at 5:50 AM Arnaldo Carvalho de Melo
<arnaldo.melo@...il.com> wrote:
>
> Em Wed, Oct 17, 2018 at 09:11:40AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Adding Alexey, Jiri and Namhyung as they worked/are working on
> > multithreading 'perf record'.
> >
> > Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> > > On Tue, Oct 16, 2018 at 4:43 PM David Ahern <dsahern@...il.com> wrote:
> > > > On 10/15/18 4:33 PM, Song Liu wrote:
> > > > > I am working with Alexei on the idea of fetching BPF program information via
> > > > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > > > to perf_event_type, and dumped these events to perf event ring buffer.
> >
> > > > > I found that perf will not process event until the end of perf-record:
> >
> > > > > root@...t-test:~# ~/perf record -ag -- sleep 10
> > > > > ...... 10 seconds later
> > > > > [ perf record: Woken up 34 times to write data ]
> > > > > machine__process_bpf_event: prog_id 6 loaded
> > > > > machine__process_bpf_event: prog_id 6 unloaded
> > > > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]
> >
> > > > > In this example, the bpf program was loaded and then unloaded in
> > > > > another terminal. When machine__process_bpf_event() processes
> > > > > the load event, the bpf program is already unloaded. Therefore,
> > > > > machine__process_bpf_event() will not be able to get information
> > > > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.
> >
> > > > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > > > as soon as perf get the event from kernel. I looked around the perf
> > > > > code for a while. But I haven't found a good example where some
> > > > > events are processed before the end of perf-record. Could you
> > > > > please help me with this?
> >
> > > > perf record does not process events as they are generated. Its sole job
> > > > is pushing data from the maps to a file as fast as possible meaning in
> > > > bulk based on current read and write locations.
> >
> > > > Adding code to process events will add significant overhead to the
> > > > record command and will not really solve your race problem.
> >
> > > I agree that processing events while recording has significant overhead.
> > > In this case, perf user space need to know details about the the jited BPF
> > > program. It is impossible to pass all these details to user space through
> > > the relatively stable ring_buffer API. Therefore, some processing of the
> > > data is necessary (get bpf prog_id from ring buffer, and then fetch program
> > > details via BPF_OBJ_GET_INFO_BY_FD.
> >
> > > I have some idea on processing important data with relatively low overhead.
> > > Let me try implement it.
> >
> > Well, you could have a separate thread processing just those kinds of
> > events, associate it with a dummy event where you only ask for
> > PERF_RECORD_BPF_EVENTs.
> >
> > Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
> > perf_event_attr:
> >
> > [root@...enth ~]# perf record -vv -e dummy sleep 01
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 1
> > size 112
> > config 0x9
> > { sample_period, sample_freq } 4000
> > sample_type IP|TID|TIME|PERIOD
> > disabled 1
> > inherit 1
>
> These you would have disabled, no need for
> PERF_RECORD_{MMAP*,COMM,FORK,EXIT} just PERF_RECORD_BPF_EVENT
>
> > mmap 1
> > comm 1
> > task 1
> > mmap2 1
> > comm_exec 1
>
>
Thanks Arnaldo! This looks better than my original idea (using POLLPRI
to highlight
special events). I will try implement the BPF_EVENT in this direction.
Song
Powered by blists - more mailing lists