[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1Ym+3QH0xFaBOGvY=nU2ohL4EDZ0kv5jMtvR_YzNsiRiw@mail.gmail.com>
Date: Wed, 8 Feb 2023 12:13:31 -0800
From: Joanne Koong <joannelkoong@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Network Development <netdev@...r.kernel.org>,
Kumar Kartikeya Dwivedi <memxor@...il.com>,
Kernel Team <kernel-team@...com>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs
On Tue, Feb 7, 2023 at 6:25 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Fri, Feb 03, 2023 at 01:37:46PM -0800, Andrii Nakryiko wrote:
> > On Thu, Feb 2, 2023 at 3:43 AM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Wed, Feb 1, 2023 at 5:21 PM Andrii Nakryiko
> > > <andrii.nakryiko@...il.com> wrote:
> > > >
> > > > On Tue, Jan 31, 2023 at 4:40 PM Alexei Starovoitov
> > > > <alexei.starovoitov@...il.com> wrote:
> > > > >
> > > > > On Tue, Jan 31, 2023 at 04:11:47PM -0800, Andrii Nakryiko wrote:
> > > > > > >
> > > > > > > When prog is just parsing the packet it doesn't need to finalize with bpf_dynptr_write.
> > > > > > > The prog can always write into the pointer followed by if (p == buf) bpf_dynptr_write.
> > > > > > > No need for rdonly flag, but extra copy is there in case of cloned which
> > > > > > > could have been avoided with extra rd_only flag.
> > > > > >
> > > > > > Yep, given we are designing bpf_dynptr_slice for performance, extra
> > > > > > copy on reads is unfortunate. ro/rw flag or have separate
> > > > > > bpf_dynptr_slice_rw vs bpf_dynptr_slice_ro?
> > > > >
> > > > > Either flag or two kfuncs sound good to me.
> > > >
> > > > Would it make sense to make bpf_dynptr_slice() as read-only variant,
> > > > and bpf_dynptr_slice_rw() for read/write? I think the common case is
> > > > read-only, right? And if users mistakenly use bpf_dynptr_slice() for
> > > > r/w case, they will get a verifier error when trying to write into the
> > > > returned pointer. While if we make bpf_dynptr_slice() as read-write,
> > > > users won't realize they are paying a performance penalty for
> > > > something that they don't actually need.
> > >
> > > Makes sense and it matches skb_header_pointer() usage in the kernel
> > > which is read-only. Since there is no verifier the read-only-ness
> > > is not enforced, but we can do it.
> > >
> > > Looks like we've converged on bpf_dynptr_slice() and bpf_dynptr_slice_rw().
> > > The question remains what to do with bpf_dynptr_data() backed by skb/xdp.
> > > Should we return EINVAL to discourage its usage?
> > > Of course, we can come up with sensible behavior for bpf_dynptr_data(),
> > > but it will have quirks that will be not easy to document.
> > > Even with extensive docs the users might be surprised by the behavior.
> >
> > I feel like having bpf_dynptr_data() working in the common case for
> > skb/xdp would be nice (e.g., so basically at least work in cases when
> > we don't need to pull).
> >
> > But we've been discussing bpf_dynptr_slice() with Joanne today, and we
> > came to the conclusion that bpf_dynptr_slice()/bpf_dynptr_slice_rw()
> > should work for any kind of dynptr (LOCAL, RINGBUF, SKB, XDP). So
> > generic code that wants to work with any dynptr would be able to just
> > use bpf_dynptr_slice, even for LOCAL/RINGBUF, even though buffer won't
> > ever be filled for LOCAL/RINGBUF.
>
> great
>
> > In application, though, if I know I'm working with LOCAL or RINGBUF
> > (or MALLOC, once we have it), I'd use bpf_dynptr_data() to fill out
> > fixed parts, of course. bpf_dynptr_slice() would be cumbersome for
> > such cases (especially if I have some huge fixed part that I *know* is
> > available in RINGBUF/MALLOC case).
>
> bpf_dynptr_data() for local and ringbuf is fine, of course.
> It already exists and has to continue working.
> bpf_dynptr_data() for xdp is probably ok as well,
> but bpf_dynptr_data() for skb is problematic.
> data/data_end concept looked great back in 2016 when it was introduced
> and lots of programs were written, but we underestimated the impact
> of driver's copybreak on programs.
> Network parsing progs consume headers one by one and would typically
> be written as:
> if (header > data_end)
> return DROP;
> Some drivers copybreak fixed number of bytes. Others try to be smart
> and copy only headers into linear part of skb.
> The drivers also change. At one point we tried to upgrade the kernel
> and suddenly bpf firewall started blocking valid traffic.
> Turned out the driver copybreak heuristic was changed in that kernel.
> The bpf prog was converted to use skb_load_bytes() and the fire was extinguished.
> It was a hard lesson.
> Others learned the danger of data/data_end the hard way as well.
> Take a look at cloudflare's progs/test_cls_redirect.c.
> It's a complicated combination of data/data_end and skb_load_bytes().
> It's essentially implementing skb_header_pointer.
> I wish we could use bpf_dynptr_slice only and remove data/data_end,
> but we cannot, since it's uapi.
> But we shouldn't repeat the same mistake. If we do bpf_dynptr_data()
> that returns linear part people will be hitting the same fragility and
> difficult to debug bugs.
> bpf_dynptr_data() for XDP is ok-ish, since most of XDP is still
> page-per-packet, but patches to split headers in HW are starting to appear.
> So even for XDP data/data_end concept may bite us.
> Hence my preference is to EINVAL in bpf_dynptr_data() at least for skb,
> since bpf_dynptr_slice() is a strictly better alternative.
This makes sense to me, I will have the next version of this patchset
return -EINVAL if bpf_dynptr_data() is used on a skb or xdp dynptr
Powered by blists - more mailing lists