netdev - Re: [PATCH v3 bpf-next 2/3] selftests/bpf: add variable-length data concatenation pattern test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 Jun 2020 17:25:11 -0700
From:   Yonghong Song <yhs@...com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>
CC:     Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        john fastabend <john.fastabend@...il.com>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v3 bpf-next 2/3] selftests/bpf: add variable-length data
 concatenation pattern test



On 6/23/20 4:25 PM, Alexei Starovoitov wrote:
> On Tue, Jun 23, 2020 at 11:15:58PM +0200, Daniel Borkmann wrote:
>> On 6/23/20 10:52 PM, Andrii Nakryiko wrote:
>>> On Tue, Jun 23, 2020 at 1:39 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>>>> On 6/23/20 5:22 AM, Andrii Nakryiko wrote:
>>>>> Add selftest that validates variable-length data reading and concatentation
>>>>> with one big shared data array. This is a common pattern in production use for
>>>>> monitoring and tracing applications, that potentially can read a lot of data,
>>>>> but overall read much less. Such pattern allows to determine precisely what
>>>>> amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
>>>>>
>>>>> Signed-off-by: Andrii Nakryiko <andriin@...com>
>>>>
>>>> Currently getting the below errors on these tests. My last clang/llvm git build
>>>> is on 4676cf444ea2 ("[Clang] Skip adding begin source location for PragmaLoopHint'd
>>>> loop when[...]"):
>>>
>>> Yeah, you need 02553b91da5d ("bpf: bpf_probe_read_kernel_str() has to
>>> return amount of data read on success") from bpf tree.
>>
>> Fair point, it's in net- but not yet in net-next tree, so bpf-next sync needs
>> to wait.
>>
>>> I'm eagerly awaiting bpf being merged into bpf-next :)
>>
>> I'll cherry-pick 02553b91da5d locally for testing and if it passes I'll push
>> these out.
> 
> I've merged the bpf_probe_read_kernel_str() fix into bpf-next and 3 extra commits
> prior to that one so that sha of the bpf_probe_read_kernel_str() fix (02553b91da5de)
> is exactly the same in bpf/net/linus/bpf-next. I think that shouldn't cause
> issue during bpf-next pull into net-next and later merge with Linus's tree.
> Crossing fingers, since we're doing this experiment for the first time.
> 
> Daniel pushed these 3 commits as well.
> Now varlen and kernel_reloc tests are good, but we have a different issue :(
> ./test_progs-no_alu32 -t get_stack_raw_tp
> is now failing, but for a different reason.
> 
> 52: (85) call bpf_get_stack#67
> 53: (bf) r8 = r0
> 54: (bf) r1 = r8
> 55: (67) r1 <<= 32
> 56: (c7) r1 s>>= 32
> ; if (usize < 0)
> 57: (c5) if r1 s< 0x0 goto pc+26
>   R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800) R9=inv800 R10=fp0 fp-8=mmmm????
> ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> 58: (1f) r9 -= r8
> ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> 59: (bf) r2 = r7
> 60: (0f) r2 += r1
> regs=1 stack=0 before 52: (85) call bpf_get_stack#67
> ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> 61: (bf) r1 = r6
> 62: (bf) r3 = r9
> 63: (b7) r4 = 0
> 64: (85) call bpf_get_stack#67
>   R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=800,var_off=(0x0; 0x3ff),s32_max_value=1023,u32_max_value=1023) R3_w=inv(id=0,umax_value=9223372036854776608) R4_w=inv0 R6=ctx(id=0?
> R3 unbounded memory access, use 'var &= const' or 'if (var < const)'
> 
> In the C code it was this:
>          usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>          if (usize < 0)
>                  return 0;
> 
>          ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>          if (ksize < 0)
>                  return 0;
> 
> We used to have problem with pointer arith in R2.
> Now it's a problem with two integers in R3.
> 'if (usize < 0)' is comparing R1 and makes it [0,800], but R8 stays [-inf,800].
> Both registers represent the same 'usize' variable.
> Then R9 -= R8 is doing 800 - [-inf, 800]
> so the result of "max_len - usize" looks unbounded to the verifier while
> it's obvious in C code that "max_len - usize" should be [0, 800].
> 
> The following diff 'fixes' the issue for no_alu32:
> diff --git a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> index 29817a703984..93058136d608 100644
> --- a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> +++ b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> @@ -2,6 +2,7 @@
> 
>   #include <linux/bpf.h>
>   #include <bpf/bpf_helpers.h>
> +#define var_barrier(a) asm volatile ("" : "=r"(a) : "0"(a))
> 
>   /* Permit pretty deep stack traces */
>   #define MAX_STACK_RAWTP 100
> @@ -84,10 +85,12 @@ int bpf_prog1(void *ctx)
>                  return 0;
> 
>          usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> +       var_barrier(usize);
>          if (usize < 0)
>                  return 0;
> 
>          ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> +       var_barrier(ksize);
>          if (ksize < 0)
>                  return 0;
> 
> But it breaks alu32 case.
> 
> I'm using llvm 11 fwiw.
> 
> Long term Yonghong is working on llvm support to emit this kind
> of workarounds automatically.
> I'm still thinking what to do next. Ideas?

The following source change will make both alu32 and non-alu32 happy:

  SEC("raw_tracepoint/sys_enter")
  int bpf_prog1(void *ctx)
  {
-       int max_len, max_buildid_len, usize, ksize, total_size;
+       int max_len, max_buildid_len, total_size;
+       long usize, ksize;
         struct stack_trace_t *data;
         void *raw_data;
         __u32 key = 0;

I have not checked the reason why it works. Mostly this confirms to
the function signature so compiler generates more friendly code.