netdev - Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <97707bf2-ace3-18a5-3621-f69122dd93df@fb.com>
Date:   Wed, 17 Oct 2018 16:36:08 +0000
From:   Alexei Starovoitov <ast@...com>
To:     David Ahern <dsahern@...il.com>, Song Liu <liu.song.a23@...il.com>
CC:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        "acme@...nel.org" <acme@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Alexei Starovoitov <ast@...nel.org>,
        "David S . Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        Networking <netdev@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog
 load/unload

On 10/17/18 8:09 AM, David Ahern wrote:
> On 10/16/18 11:43 PM, Song Liu wrote:
>> I agree that processing events while recording has significant overhead.
>> In this case, perf user space need to know details about the the jited BPF
>> program. It is impossible to pass all these details to user space through
>> the relatively stable ring_buffer API. Therefore, some processing of the
>> data is necessary (get bpf prog_id from ring buffer, and then fetch program
>> details via BPF_OBJ_GET_INFO_BY_FD.
>>
>> I have some idea on processing important data with relatively low overhead.
>> Let me try implement it.
>>
>
> As I understand it, you want this series:
>
>  kernel: add event to perf buffer on bpf prog load
>
>  userspace: perf reads the event and grabs information about the program
> from the fd
>
> Is that correct?
>
> Userpsace is not awakened immediately when an event is added the the
> ring. It is awakened once the number of events crosses a watermark. That
> means there is an unknown - and potentially long - time window where the
> program can be unloaded before perf reads the event.
>
> So no matter what you do expecting perf record to be able to process the
> event quickly is an unreasonable expectation.

yes... unless we go with threaded model as Arnaldo suggested and use
single event as a watermark to wakeup our perf thread.
In such case there is still a race window between user space waking up
and doing _single_ bpf_get_fd_from_id() call to hold that prog
and some other process trying to instantly unload the prog it
just loaded.
I think such race window is extremely tiny and if perf misses
those load/unload events it's a good thing, since there is no chance
that normal pmu event samples would be happening during prog execution.

The alternative approach with no race window at all is to burden kernel
RECORD_* events with _all_ information about bpf prog. Which is jited
addresses, jited image itself, info about all subprogs, info about line
info, all BTF data, etc. As I said earlier I'm strongly against such
RECORD_* bloating.
Instead we need to find a way to process new RECORD_BPF events with
single prog_id field in perf user space with minimal race
and threaded approach sounds like a win to me.