netdev - Re: [PATCH v3 bpf-next 2/3] selftests/bpf: add variable-length data concatenation pattern test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ef2ecf4b7bd9_37452b132c4de5bcc@john-XPS-13-9370.notmuch>
Date:   Tue, 23 Jun 2020 23:04:36 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Yonghong Song <yhs@...com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        john fastabend <john.fastabend@...il.com>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v3 bpf-next 2/3] selftests/bpf: add variable-length data
 concatenation pattern test

Andrii Nakryiko wrote:
> On Tue, Jun 23, 2020 at 5:25 PM Yonghong Song <yhs@...com> wrote:
> >
> >
> >
> > On 6/23/20 4:25 PM, Alexei Starovoitov wrote:
> > > On Tue, Jun 23, 2020 at 11:15:58PM +0200, Daniel Borkmann wrote:
> > >> On 6/23/20 10:52 PM, Andrii Nakryiko wrote:
> > >>> On Tue, Jun 23, 2020 at 1:39 PM Daniel Borkmann <daniel@...earbox.net> wrote:
> > >>>> On 6/23/20 5:22 AM, Andrii Nakryiko wrote:
> > >>>>> Add selftest that validates variable-length data reading and concatentation
> > >>>>> with one big shared data array. This is a common pattern in production use for
> > >>>>> monitoring and tracing applications, that potentially can read a lot of data,
> > >>>>> but overall read much less. Such pattern allows to determine precisely what
> > >>>>> amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
> > >>>>>
> > >>>>> Signed-off-by: Andrii Nakryiko <andriin@...com>
> > >>>>
> > >>>> Currently getting the below errors on these tests. My last clang/llvm git build
> > >>>> is on 4676cf444ea2 ("[Clang] Skip adding begin source location for PragmaLoopHint'd
> > >>>> loop when[...]"):
> > >>>
> > >>> Yeah, you need 02553b91da5d ("bpf: bpf_probe_read_kernel_str() has to
> > >>> return amount of data read on success") from bpf tree.
> > >>
> > >> Fair point, it's in net- but not yet in net-next tree, so bpf-next sync needs
> > >> to wait.
> > >>
> > >>> I'm eagerly awaiting bpf being merged into bpf-next :)
> > >>
> > >> I'll cherry-pick 02553b91da5d locally for testing and if it passes I'll push
> > >> these out.
> > >
> > > I've merged the bpf_probe_read_kernel_str() fix into bpf-next and 3 extra commits
> > > prior to that one so that sha of the bpf_probe_read_kernel_str() fix (02553b91da5de)
> > > is exactly the same in bpf/net/linus/bpf-next. I think that shouldn't cause
> > > issue during bpf-next pull into net-next and later merge with Linus's tree.
> > > Crossing fingers, since we're doing this experiment for the first time.
> > >
> > > Daniel pushed these 3 commits as well.
> > > Now varlen and kernel_reloc tests are good, but we have a different issue :(
> > > ./test_progs-no_alu32 -t get_stack_raw_tp
> > > is now failing, but for a different reason.
> > >
> > > 52: (85) call bpf_get_stack#67
> > > 53: (bf) r8 = r0
> > > 54: (bf) r1 = r8
> > > 55: (67) r1 <<= 32
> > > 56: (c7) r1 s>>= 32
> > > ; if (usize < 0)
> > > 57: (c5) if r1 s< 0x0 goto pc+26
> > >   R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800) R9=inv800 R10=fp0 fp-8=mmmm????
> > > ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > > 58: (1f) r9 -= r8
> > > ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > > 59: (bf) r2 = r7
> > > 60: (0f) r2 += r1
> > > regs=1 stack=0 before 52: (85) call bpf_get_stack#67
> > > ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > > 61: (bf) r1 = r6
> > > 62: (bf) r3 = r9
> > > 63: (b7) r4 = 0
> > > 64: (85) call bpf_get_stack#67
> > >   R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=800,var_off=(0x0; 0x3ff),s32_max_value=1023,u32_max_value=1023) R3_w=inv(id=0,umax_value=9223372036854776608) R4_w=inv0 R6=ctx(id=0?
> > > R3 unbounded memory access, use 'var &= const' or 'if (var < const)'
> > >
> > > In the C code it was this:
> > >          usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> > >          if (usize < 0)
> > >                  return 0;
> > >
> > >          ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > >          if (ksize < 0)
> > >                  return 0;
> > >
> > > We used to have problem with pointer arith in R2.
> > > Now it's a problem with two integers in R3.
> > > 'if (usize < 0)' is comparing R1 and makes it [0,800], but R8 stays [-inf,800].
> > > Both registers represent the same 'usize' variable.
> > > Then R9 -= R8 is doing 800 - [-inf, 800]
> > > so the result of "max_len - usize" looks unbounded to the verifier while
> > > it's obvious in C code that "max_len - usize" should be [0, 800].
> > >
> > > The following diff 'fixes' the issue for no_alu32:
> > > diff --git a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> > > index 29817a703984..93058136d608 100644
> > > --- a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> > > +++ b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> > > @@ -2,6 +2,7 @@
> > >
> > >   #include <linux/bpf.h>
> > >   #include <bpf/bpf_helpers.h>
> > > +#define var_barrier(a) asm volatile ("" : "=r"(a) : "0"(a))
> > >
> > >   /* Permit pretty deep stack traces */
> > >   #define MAX_STACK_RAWTP 100
> > > @@ -84,10 +85,12 @@ int bpf_prog1(void *ctx)
> > >                  return 0;
> > >
> > >          usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> > > +       var_barrier(usize);
> > >          if (usize < 0)
> > >                  return 0;
> > >
> > >          ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > > +       var_barrier(ksize);
> > >          if (ksize < 0)
> > >                  return 0;
> > >
> > > But it breaks alu32 case.
> > >
> > > I'm using llvm 11 fwiw.
> > >
> > > Long term Yonghong is working on llvm support to emit this kind
> > > of workarounds automatically.
> > > I'm still thinking what to do next. Ideas?
> >
> 
> Funny enough, Alexei's fix didn't fix even no_alu32 case for me. Also
> have one of the latest clang 11...
> 
> > The following source change will make both alu32 and non-alu32 happy:
> >
> >   SEC("raw_tracepoint/sys_enter")
> >   int bpf_prog1(void *ctx)
> >   {
> > -       int max_len, max_buildid_len, usize, ksize, total_size;
> > +       int max_len, max_buildid_len, total_size;
> > +       long usize, ksize;
> 
> This does fix it, both alu32 and no-alu32 pass.
> 
> >          struct stack_trace_t *data;
> >          void *raw_data;
> >          __u32 key = 0;
> >
> > I have not checked the reason why it works. Mostly this confirms to
> > the function signature so compiler generates more friendly code.
> 
> Yes, it's due to the compiler not doing all the casting/bit shifting.
> Just straightforward use of a single register consistently across
> conditional jump and offset calculations.

Another option is to drop the int->long uapi conversion and write the
varlen test using >=0 tests. The below diff works for me also using
recent clang-11, but maybe doesn't resolve Andrii's original issue.
My concern is if we break existing code in selftests is there a risk
users will get breaking code as well? Seems like without a few
additional clang improvements its going to be hard to get all
combinations working by just fiddling with the types.

diff --git a/tools/testing/selftests/bpf/progs/test_varlen.c b/tools/testing/selftests/bpf/progs/test_varlen.c
index 0969185..01c992c 100644
--- a/tools/testing/selftests/bpf/progs/test_varlen.c
+++ b/tools/testing/selftests/bpf/progs/test_varlen.c
@@ -31,20 +31,20 @@ int handler64(void *regs)
 {
 	int pid = bpf_get_current_pid_tgid() >> 32;
 	void *payload = payload1;
-	u64 len;
+	int len;
 
 	/* ignore irrelevant invocations */
 	if (test_pid != pid || !capture)
 		return 0;
 
 	len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in1[0]);
-	if (len <= MAX_LEN) {
+	if (len >= 0) {
 		payload += len;
 		payload1_len1 = len;
 	}
 
 	len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in2[0]);
-	if (len <= MAX_LEN) {
+	if (len >= 0) {
 		payload += len;
 		payload1_len2 = len;
 	}
@@ -59,20 +59,20 @@ int handler32(void *regs)
 {
 	int pid = bpf_get_current_pid_tgid() >> 32;
 	void *payload = payload2;
-	u32 len;
+	int len;
 
 	/* ignore irrelevant invocations */
 	if (test_pid != pid || !capture)
 		return 0;
 
 	len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in1[0]);
-	if (len <= MAX_LEN) {
+	if (len >= 0) {
 		payload += len;
 		payload2_len1 = len;
 	}
 
 	len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in2[0]);
-	if (len <= MAX_LEN) {
+	if (len >= 0) {
 		payload += len;
 		payload2_len2 = len;
 	}