[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230307173529.gi2crls7fktn6uox@apollo>
Date: Tue, 7 Mar 2023 18:35:29 +0100
From: Kumar Kartikeya Dwivedi <memxor@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Joanne Koong <joannelkoong@...il.com>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <martin.lau@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
Toke Høiland-Jørgensen <toke@...nel.org>
Subject: Re: [PATCH v13 bpf-next 09/10] bpf: Add bpf_dynptr_slice and
bpf_dynptr_slice_rdwr
On Tue, Mar 07, 2023 at 04:45:14PM CET, Alexei Starovoitov wrote:
> On Tue, Mar 7, 2023 at 2:22 AM Kumar Kartikeya Dwivedi <memxor@...il.com> wrote:
> >
> > On Tue, Mar 07, 2023 at 03:23:25AM CET, Alexei Starovoitov wrote:
> > > On Sun, Mar 5, 2023 at 11:10 PM Kumar Kartikeya Dwivedi
> > > <memxor@...il.com> wrote:
> > > >
> > > > On Wed, Mar 01, 2023 at 04:49:52PM CET, Joanne Koong wrote:
> > > > > Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
> > > > > The user must pass in a buffer to store the contents of the data slice
> > > > > if a direct pointer to the data cannot be obtained.
> > > > >
> > > > > For skb and xdp type dynptrs, these two APIs are the only way to obtain
> > > > > a data slice. However, for other types of dynptrs, there is no
> > > > > difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.
> > > > >
> > > > > For skb type dynptrs, the data is copied into the user provided buffer
> > > > > if any of the data is not in the linear portion of the skb. For xdp type
> > > > > dynptrs, the data is copied into the user provided buffer if the data is
> > > > > between xdp frags.
> > > > >
> > > > > If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
> > > > > the skb will be uncloned (see bpf_unclone_prologue()).
> > > > >
> > > > > Please note that any bpf_dynptr_write() automatically invalidates any prior
> > > > > data slices of the skb dynptr. This is because the skb may be cloned or
> > > > > may need to pull its paged buffer into the head. As such, any
> > > > > bpf_dynptr_write() will automatically have its prior data slices
> > > > > invalidated, even if the write is to data in the skb head of an uncloned
> > > > > skb. Please note as well that any other helper calls that change the
> > > > > underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
> > > > > slices of the skb dynptr as well, for the same reasons.
> > > > >
> > > > > Signed-off-by: Joanne Koong <joannelkoong@...il.com>
> > > > > ---
> > > >
> > > > Sorry for chiming in late.
> > > >
> > > > I see one potential hole in bpf_dynptr_slice_rdwr. If the returned pointer is
> > > > actually pointing to the stack (but verified as a PTR_TO_MEM in verifier state),
> > > > we won't reflect changes to the stack state in the verifier for writes happening
> > > > through it.
> > > >
> > > > For the worst case scenario, this will basically allow overwriting values of
> > > > spilled pointers and doing arbitrary kernel memory reads/writes. This is only an
> > > > issue when bpf_dynptr_slice_rdwr at runtime returns a pointer to the supplied
> > > > buffer residing on program stack. To verify, by forcing the memcpy to buffer for
> > > > skb_header_pointer I was able to make it dereference a garbage value for
> > > > l4lb_all selftest.
> > > >
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > > @@ -2253,7 +2253,13 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
> > > > case BPF_DYNPTR_TYPE_RINGBUF:
> > > > return ptr->data + ptr->offset + offset;
> > > > case BPF_DYNPTR_TYPE_SKB:
> > > > - return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
> > > > + {
> > > > + void *p = skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
> > > > + if (p == buffer)
> > > > + return p;
> > > > + memcpy(buffer, p, len);
> > > > + return buffer;
> > > > + }
> > > >
> > > > --- a/tools/testing/selftests/bpf/progs/test_l4lb_noinline_dynptr.c
> > > > +++ b/tools/testing/selftests/bpf/progs/test_l4lb_noinline_dynptr.c
> > > > @@ -470,7 +470,10 @@ int balancer_ingress(struct __sk_buff *ctx)
> > > > eth = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer));
> > > > if (!eth)
> > > > return TC_ACT_SHOT;
> > > > - eth_proto = eth->eth_proto;
> > > > + *(void **)buffer = ctx;
> > >
> > > Great catch.
> > > To fix the issue I think we should simply disallow such
> > > stack abuse. The compiler won't be spilling registers
> > > into C array on the stack.
> > > This manual spill/fill is exploiting verifier logic.
> > > After bpf_dynptr_slice_rdwr() we can mark all slots of the
> > > buffer as STACK_POISON or some better name and
> > > reject spill into such slots.
> > >
> >
> > I agree this is simpler, but I'm not sure it will work properly. Verifier won't
> > know when the lifetime of the buffer ends, so if we disallow spills until its
> > written over it's going to be a pain for users.
> >
> > Something like:
> >
> > for (...) {
> > char buf[64];
> > bpf_dynptr_slice_rdwr(..., buf, 64);
> > ...
> > }
> >
> > .. and then compiler decides to spill something where buf was located on stack
> > outside the for loop. The verifier can't know when buf goes out of scope to
> > unpoison the slots.
>
> You're saying the "verifier doesn't know when buf ...".
> The same applies to the compiler. It has no visibility
> into what bpf_dynptr_slice_rdwr is doing.
That is true, it can't assume anything about the side effects. But I am talking
about the point in the program when the buffer object no longer lives. Use of
the escaped pointer to such an object any longer is UB. The compiler is well
within its rights to reuse its stack storage at that point, including for
spilling registers. Which is why "outside the for loop" in my earlier reply.
> So it never spills into a declared C array
> as I tried to explain in the previous reply.
> Spill/fill slots are always invisible to C.
> (unless of course you do pointer arithmetic asm style)
When the declared array's lifetime ends, it can.
https://godbolt.org/z/Ez7v4xfnv
The 2nd call to bar as part of unrolled loop happens with fp-8, then it calls
baz, spills r0 to fp-8, and calls bar again with fp-8.
If such a stack slot is STACK_POISON, verifier will reject this program.
>
> > > > + *(void **)eth = (void *)0xdeadbeef;
> > > > + ctx = *(void **)buffer;
> > > > + eth_proto = eth->eth_proto + ctx->len;
> > > > if (eth_proto == bpf_htons(ETH_P_IP))
> > > > err = process_packet(&ptr, eth, nh_off, false, ctx);
> > > >
> > > > I think the proper fix is to treat it as a separate return type distinct from
> > > > PTR_TO_MEM like PTR_TO_MEM_OR_PKT (or handle PTR_TO_MEM | DYNPTR_* specially),
> > > > fork verifier state whenever there is a write, so that one path verifies it as
> > > > PTR_TO_PACKET, while another as PTR_TO_STACK (if buffer was a stack ptr). I
> > > > think for the rest it's not a problem, but there are allow_ptr_leak checks
> > > > applied to PTR_TO_STACK and PTR_TO_MAP_VALUE, so that needs to be rechecked.
> > > > Then we ensure that program is safe in either path.
> > > >
> > > > Also we need to fix regsafe to not consider other PTR_TO_MEMs equivalent to such
> > > > a pointer. We could also fork verifier states on return, to verify either path
> > > > separately right from the point following the call instruction.
> > >
> > > This is too complex imo.
> >
> > A better way to phrase this is to verify with R0 = PTR_TO_PACKET in one path,
> > and push_stack with R0 = buffer's reg->type + size set to len in the other path
> > for exploration later. In terms of verifier infra everything is there already,
> > it just needs to analyze both cases which fall into the regular code handling
> > the reg->type's. Probably then no adjustments to regsafe are needed either. It's
> > like exploring branch instructions.
>
> I still don't like it. There is no reason to go a complex path
> when much simpler suffices.
Powered by blists - more mailing lists