linux-kernel - Re: [RFC 00/13] perf bpf: Add support to run BEGIN/END code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180312135628.GB4882@kernel.org>
Date:   Mon, 12 Mar 2018 10:56:28 -0300
From:   Arnaldo Carvalho de Melo <acme@...nel.org>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Brendan Gregg <bgregg@...flix.com>,
        Stanislav Kozina <skozina@...hat.com>,
        "Frank Ch. Eigler" <fche@...hat.com>,
        Will Cohen <wcohen@...hat.com>,
        Eugene Syromiatnikov <esyromia@...hat.com>,
        Jerome Marchand <jmarchan@...hat.com>,
        lkml <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        David Ahern <dsahern@...il.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Jiri Olsa <jolsa@...nel.org>, Wang Nan <wangnan0@...wei.com>,
        Alexei Starovoitov <ast@...com>
Subject: Re: [RFC 00/13] perf bpf: Add support to run BEGIN/END code

Em Mon, Mar 12, 2018 at 12:17:05PM +0100, Jiri Olsa escreveu:
> adding Alexei and Wang to the loop
> 
> On Mon, Mar 12, 2018 at 10:43:00AM +0100, Jiri Olsa wrote:
> > hi,
> > this is *RFC* and the following patchset is very rough
> > and ugly 'prove of concept'-kind-of-toy code. I'm mostly
> > interested in opinions about if this could be useful in
> > your current eBPF usage.
> > 
> > Currently we can load eBPF code within the record command
> > and attach it to event. We have 2 ways of communicating
> > the data back to user: bpf-output event that goes to
> > perf.data or 'trace_printk' output in tracefs buffer.
> > 
> > AFAICS we're not covering quite large usage base that runs
> > code before and once the probe is finished to setup, collect
> > and display the collected data.
> > 
> > This patchset is adding support to run BEGIN and END
> > code snipets before and after eBPF probe is loaded.

Right, with all the code that Wang contributed, and reusing that
begin/end code from systemtap, it was easy to do it, not that much code
added, so I don't see a reason for this not to be merged.

On top of this patchset, I think that the restricted C code that is used
to write the eBPF utilities should be simplified, I've toyed with this
from time to time, for instance:

[root@...et bpf]# cat o_cloexec.c 
#include "bpf.h"
#include "stdio.h"

#define O_CLOEXEC       0x80000

int syscall_enter(openat)
{
	char filename[256];
	int flags = syscall_field_int(flags, 32);
	int len = syscall_field_str(filename, 24);

	if (!(flags & O_CLOEXEC))
		return 0;

	perf_stdout(filename, len);
	return 1;
}

[root@...et bpf]# perf trace -e openat,o_cloexec.c
     0.573 (         ): __bpf_stdout__:/etc/ld.so.cache....)
     0.576 (         ): syscalls:sys_enter_openat:dfd: 0xffffffffffffff9c, filename: 0x7fc4de411563, flags: 0x00080000, mode: 0x00000000)
     0.579 ( 0.013 ms): sh/17728 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC           ) = 3
     0.620 (         ): __bpf_stdout__:/lib64/libtinfo.so.6........)
     0.622 (         ): syscalls:sys_enter_openat:dfd: 0xffffffffffffff9c, filename: 0x7fc4de619ce0, flags: 0x00080000, mode: 0x00000000)
     0.624 ( 0.013 ms): sh/17728 openat(dfd: CWD, filename: /lib64/libtinfo.so.6, flags: CLOEXEC       ) = 3
     0.705 (         ): __bpf_stdout__:/lib64/libdl.so.2...)
     0.708 (         ): syscalls:sys_enter_openat:dfd: 0xffffffffffffff9c, filename: 0x7fc4de5ef4c0, flags: 0x00080000, mode: 0x00000000)
     0.710 ( 0.058 ms): sh/17728 openat(dfd: CWD, filename: /lib64/libdl.so.2, flags: CLOEXEC          ) = 3
     0.852 (         ): __bpf_stdout__:/lib64/libc.so.6....)
     0.857 (         ): syscalls:sys_enter_openat:dfd: 0xffffffffffffff9c, filename: 0x7fc4de5ef9a0, flags: 0x00080000, mode: 0x00000000)
     0.860 ( 0.021 ms): sh/17728 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC           ) = 3
^C
[root@...et bpf]#

Hiding details such as:

[root@...et bpf]# cat stdio.h 
struct bpf_map_def SEC("maps") __bpf_stdout__ = {
       .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       .key_size = sizeof(int),
       .value_size = sizeof(u32),
       .max_entries = __NR_CPUS__,
};

#define perf_stdout(from, len) \
	perf_event_output(ctx, &__bpf_stdout__, BPF_F_CURRENT_CPU, \
			  &from, len & (sizeof(from) - 1));
[root@...et bpf]#

That 'perf trace' will setup "bpf_output" event, etc.

And the other macros:

#define SEC(NAME) __attribute__((section(NAME), used))

#define pid_map(name, value_type) \
struct bpf_map_def SEC("maps") name = { \
        .type        = BPF_MAP_TYPE_HASH, \
        .key_size    = sizeof(u64), \
        .value_size  = sizeof(value_type), \
        .max_entries = 500, \
}

#define syscall_enter(name) \
        SEC("syscalls:sys_enter_" #name) syscall_enter_ ## name(void *ctx)

#define syscall_exit(name) \
        SEC("syscalls:sys_exit_" #name) syscall_exit_ ## name(void *ctx)

#define syscall_field_str(field, offset) \
        ({ char *__ptr = *((char **)(ctx + offset)); \
           bpf_probe_read_str(field, sizeof(field), __ptr); })

#define syscall_field_int(field, offset) \
        ({ int *__ptr = (int *)(ctx + offset); \
           bpf_probe_read(&field, sizeof(field), __ptr); field; }

While this hides some of the details, it still hardcodes the offset, so
should be used that way, I was trying to read about clang internals to
do some preprocessing trick that would automagically make the tracepoint
fields accessible as local variables, reading the tracepoint format
files from the running system or from the description stored in the
perf.data header, when running these things on perf.data files.

- Arnaldo