[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20181105123324.GA11147@kernel.org>
Date: Mon, 5 Nov 2018 09:33:24 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Yonghong Song <yhs@...com>
Cc: Edward Cree <ecree@...arflare.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jiri Olsa <jolsa@...hat.com>, Martin Lau <kafai@...com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Linux Networking Development Mailing List
<netdev@...r.kernel.org>
Subject: Re: Help with the BPF verifier
Em Fri, Nov 02, 2018 at 09:27:52PM +0000, Yonghong Song escreveu:
>
>
> On 11/2/18 8:42 AM, Edward Cree wrote:
> > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> >> Yeah, didn't work as well:
> >
> >> And the -vv in 'perf trace' didn't seem to map to further details in the
> >> output of the verifier debug:
> > Yeah for log_level 2 you probably need to make source-level changes to either
> > perf or libbpf (I think the latter). It's annoying that essentially no tools
> > plumb through an option for that, someone should fix them ;-)
> >
> >> libbpf: -- BEGIN DUMP LOG ---
> >> libbpf:
> >> 0: (bf) r6 = r1
> >> 1: (bf) r1 = r10
> >> 2: (07) r1 += -328
> >> 3: (b7) r7 = 64
> >> 4: (b7) r2 = 64
> >> 5: (bf) r3 = r6
> >> 6: (85) call bpf_probe_read#4
> >> 7: (79) r1 = *(u64 *)(r10 -320)
> >> 8: (15) if r1 == 0x101 goto pc+4
> >> R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 9: (55) if r1 != 0x2 goto pc+22
> >> R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 10: (bf) r1 = r6
> >> 11: (07) r1 += 16
> >> 12: (05) goto pc+2
> >> 15: (79) r3 = *(u64 *)(r1 +0)
> >> dereference of modified ctx ptr R1 off=16 disallowed
> > Aha, we at least got a different error message this time.
> > And indeed llvm has done that optimisation, rather than the more obvious
> > 11: r3 = *(u64 *)(r1 +16)
> > because it wants to have lots of reads share a single insn. You may be able
> > to defeat that optimisation by adding compiler barriers, idk. Maybe someone
> > with llvm knowledge can figure out how to stop it (ideally, llvm would know
> > when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯
>
> The optimization mostly likes below:
> br1:
> ...
> r1 += 16
> goto merge
> br2:
> ...
> r1 += 20
> goto merge
> merge:
> *(u64 *)(r1 + 0)
>
> The compiler tries to merge common loads. There is no easy way to
> stop this compiler optimization without turning off a lot of other
> optimizations. The easiest way is to add barriers
> __asm__ __volatile__("": : :"memory")
> after the ctx memory access to prevent their down stream merging.
Great, this made it work:
cat /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#include <stdio.h>
#include <linux/socket.h>
/* bpf-output associated map */
struct bpf_map SEC("maps") __augmented_syscalls__ = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
struct syscall_enter_args {
unsigned long long common_tp_fields;
long syscall_nr;
unsigned long args[6];
};
struct syscall_exit_args {
unsigned long long common_tp_fields;
long syscall_nr;
long ret;
};
struct augmented_filename {
unsigned int size;
int reserved;
char value[256];
};
#define SYS_OPEN 2
#define SYS_OPENAT 257
SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
struct {
struct syscall_enter_args args;
struct augmented_filename filename;
} augmented_args;
unsigned int len = sizeof(augmented_args);
const void *filename_arg = NULL;
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
switch (augmented_args.args.syscall_nr) {
case SYS_OPEN: filename_arg = (const void *)args->args[0];
__asm__ __volatile__("": : :"memory");
break;
case SYS_OPENAT: filename_arg = (const void *)args->args[1];
break;
default:
return 0;
}
if (filename_arg != NULL) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
sizeof(augmented_args.filename.value),
filename_arg);
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
len = sizeof(augmented_args.args);
}
perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
return 0;
}
SEC("raw_syscalls:sys_exit")
int sys_exit(struct syscall_exit_args *args)
{
struct syscall_exit_args augmented_args;
probe_read(&augmented_args, sizeof(augmented_args), args);
return augmented_args.syscall_nr == SYS_OPEN || augmented_args.syscall_nr == SYS_OPENAT;
}
license(GPL);
Now I have both open and openat getting the right
[root@...et perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
LLVM: dumping tools/perf/examples/bpf/augmented_raw_syscalls.o
0.000 ( 0.010 ms): sleep/15794 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.028 ( 0.005 ms): sleep/15794 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.241 ( 0.005 ms): sleep/15794 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
[root@...et perf]#
[root@...et perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.o cat /etc/passwd > /dev/null
0.000 ( 0.010 ms): cat/15918 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.028 ( 0.006 ms): cat/15918 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.241 ( 0.007 ms): cat/15918 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
0.289 ( 0.004 ms): cat/15918 openat(dfd: CWD, filename: /etc/passwd) = 3
[root@...et perf]#
[root@...et perf]# readelf -SW tools/perf/examples/bpf/augmented_raw_syscalls.o
There are 11 section headers, starting at offset 0x438:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000376 0000bd 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] raw_syscalls:sys_enter PROGBITS 0000000000000000 000040 000138 00 AX 0 0 8
[ 4] .relraw_syscalls:sys_enter REL 0000000000000000 000360 000010 10 10 3 8
[ 5] raw_syscalls:sys_exit PROGBITS 0000000000000000 000178 000070 00 AX 0 0 8
[ 6] maps PROGBITS 0000000000000000 0001e8 000038 00 WA 0 0 4
[ 7] license PROGBITS 0000000000000000 000220 000004 00 WA 0 0 1
[ 8] version PROGBITS 0000000000000000 000224 000004 00 WA 0 0 4
[ 9] .llvm_addrsig LOOS+0xfff4c03 0000000000000000 000370 000006 00 E 10 0 1
[10] .symtab SYMTAB 0000000000000000 000228 000138 18 1 7 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
[root@...et perf]#
> > Alternatively, your prog looks short enough that maybe you could kick the C
> > habit and write it directly in eBPF asm, that way no-one is optimising things
> > behind your back. (I realise this option won't appeal to everyone ;-)
>
> The LLVM supports BPF inline assembly as well. Some examples here
> https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/BPF/inline_asm.ll
> You may try it for selective ctx access to work around some
> compiler optimizations. I personally have not used it yet and actually
> not sure whether it actually works or not :-)
I will take a look at it, but the simple barrier right after the first ctx
access did the trick for me, probably I'll cook some ctx accessor macros to
make this transparent.
Now to make -e open,openat to set up filters via a map, get that
augmented_raw_syscalls.c to be added automatically to the evlist, etc.
- Arnaldo
> > The reason the verifier disallows this, iirc, is because it needs to be able
> > to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case > underlying kernel struct doesn't match the layout of the ctx ABI.
> To do this
> > it needs the ctx offset to live entirely in the insn doing the access,
> > otherwise different paths could lead to the same insn accessing different ctx
> > offsets with different fixups needed — can't be done.
> >
> > -Ed
> >
Powered by blists - more mailing lists