netdev - Re: [PATCH v3 bpf-next 2/3] selftests/bpf: add variable-length data concatenation pattern test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200623232503.yzm4g24rwx7khudf@ast-mbp.dhcp.thefacebook.com>
Date:   Tue, 23 Jun 2020 16:25:03 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        john fastabend <john.fastabend@...il.com>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v3 bpf-next 2/3] selftests/bpf: add variable-length data
 concatenation pattern test

On Tue, Jun 23, 2020 at 11:15:58PM +0200, Daniel Borkmann wrote:
> On 6/23/20 10:52 PM, Andrii Nakryiko wrote:
> > On Tue, Jun 23, 2020 at 1:39 PM Daniel Borkmann <daniel@...earbox.net> wrote:
> > > On 6/23/20 5:22 AM, Andrii Nakryiko wrote:
> > > > Add selftest that validates variable-length data reading and concatentation
> > > > with one big shared data array. This is a common pattern in production use for
> > > > monitoring and tracing applications, that potentially can read a lot of data,
> > > > but overall read much less. Such pattern allows to determine precisely what
> > > > amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
> > > > 
> > > > Signed-off-by: Andrii Nakryiko <andriin@...com>
> > > 
> > > Currently getting the below errors on these tests. My last clang/llvm git build
> > > is on 4676cf444ea2 ("[Clang] Skip adding begin source location for PragmaLoopHint'd
> > > loop when[...]"):
> > 
> > Yeah, you need 02553b91da5d ("bpf: bpf_probe_read_kernel_str() has to
> > return amount of data read on success") from bpf tree.
> 
> Fair point, it's in net- but not yet in net-next tree, so bpf-next sync needs
> to wait.
> 
> > I'm eagerly awaiting bpf being merged into bpf-next :)
> 
> I'll cherry-pick 02553b91da5d locally for testing and if it passes I'll push
> these out.

I've merged the bpf_probe_read_kernel_str() fix into bpf-next and 3 extra commits
prior to that one so that sha of the bpf_probe_read_kernel_str() fix (02553b91da5de)
is exactly the same in bpf/net/linus/bpf-next. I think that shouldn't cause
issue during bpf-next pull into net-next and later merge with Linus's tree.
Crossing fingers, since we're doing this experiment for the first time.

Daniel pushed these 3 commits as well.
Now varlen and kernel_reloc tests are good, but we have a different issue :(
./test_progs-no_alu32 -t get_stack_raw_tp
is now failing, but for a different reason.

52: (85) call bpf_get_stack#67
53: (bf) r8 = r0
54: (bf) r1 = r8
55: (67) r1 <<= 32
56: (c7) r1 s>>= 32
; if (usize < 0)
57: (c5) if r1 s< 0x0 goto pc+26
 R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800) R9=inv800 R10=fp0 fp-8=mmmm????
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
58: (1f) r9 -= r8
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
59: (bf) r2 = r7
60: (0f) r2 += r1
regs=1 stack=0 before 52: (85) call bpf_get_stack#67
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
61: (bf) r1 = r6
62: (bf) r3 = r9
63: (b7) r4 = 0
64: (85) call bpf_get_stack#67
 R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=800,var_off=(0x0; 0x3ff),s32_max_value=1023,u32_max_value=1023) R3_w=inv(id=0,umax_value=9223372036854776608) R4_w=inv0 R6=ctx(id=0?
R3 unbounded memory access, use 'var &= const' or 'if (var < const)'

In the C code it was this:
        usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
        if (usize < 0)
                return 0;

        ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
        if (ksize < 0)
                return 0;

We used to have problem with pointer arith in R2.
Now it's a problem with two integers in R3.
'if (usize < 0)' is comparing R1 and makes it [0,800], but R8 stays [-inf,800].
Both registers represent the same 'usize' variable.
Then R9 -= R8 is doing 800 - [-inf, 800]
so the result of "max_len - usize" looks unbounded to the verifier while
it's obvious in C code that "max_len - usize" should be [0, 800].

The following diff 'fixes' the issue for no_alu32:
diff --git a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
index 29817a703984..93058136d608 100644
--- a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
+++ b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
@@ -2,6 +2,7 @@

 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
+#define var_barrier(a) asm volatile ("" : "=r"(a) : "0"(a))

 /* Permit pretty deep stack traces */
 #define MAX_STACK_RAWTP 100
@@ -84,10 +85,12 @@ int bpf_prog1(void *ctx)
                return 0;

        usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
+       var_barrier(usize);
        if (usize < 0)
                return 0;

        ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
+       var_barrier(ksize);
        if (ksize < 0)
                return 0;

But it breaks alu32 case.

I'm using llvm 11 fwiw.

Long term Yonghong is working on llvm support to emit this kind
of workarounds automatically.
I'm still thinking what to do next. Ideas?