[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpVm_GdVT7MiuVGpzvx9zEsXKjZer5yF8Vh8c3EKVBM3-Q@mail.gmail.com>
Date: Mon, 15 Feb 2021 20:06:32 -0800
From: Cong Wang <xiyou.wangcong@...il.com>
To: John Fastabend <john.fastabend@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, duanxiongchun@...edance.com,
Dongdong Wang <wangdongdong.6@...edance.com>,
jiang.wang@...edance.com, Cong Wang <cong.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jakub Sitnicki <jakub@...udflare.com>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf-next v3 4/5] skmsg: use skb ext instead of TCP_SKB_CB
On Mon, Feb 15, 2021 at 5:50 PM John Fastabend <john.fastabend@...il.com> wrote:
>
> Cong Wang wrote:
> > On Mon, Feb 15, 2021 at 4:54 PM John Fastabend <john.fastabend@...il.com> wrote:
> > >
> > > Cong Wang wrote:
> > > > On Mon, Feb 15, 2021 at 3:57 PM John Fastabend <john.fastabend@...il.com> wrote:
> > > > >
> > > > > For TCP case we can continue to use CB and not pay the price. For UDP
> > > > > and AF_UNIX we can do the extra alloc.
> > > >
> > > > I see your point, but specializing TCP case does not give much benefit
> > > > here, the skmsg code would have to check skb->protocol etc. to decide
> > > > whether to use TCP_SKB_CB() or skb_ext:
> > > >
> > > > if (skb->protocol == ...)
> > > > TCP_SKB_CB(skb) = ...;
> > > > else
> > > > ext = skb_ext_find(skb);
> > > >
> > > > which looks ugly to me. And I doubt skb->protocol alone is sufficient to
> > > > distinguish TCP, so we may end up having more checks above.
> > > >
> > > > So do you really want to trade code readability with an extra alloc?
> > >
> > > Above is ugly. So I look at where the patch replaces things,
> > >
> > > sk_psock_tls_strp_read(), this is TLS specific read hook so can't really
> > > work in generic case anyways.
> > >
> > > sk_psock_strp_read(), will you have UDP, AF_UNIX stream parsers? Do these
> > > even work outside TCP cases.
> > >
> > > For these ones: sk_psock_verdict_apply(), sk_psock_verdict_recv(),
> > > sk_psock_backlog(), can't we just do some refactoring around their
> > > hook points so we know the context. For example sk_psock_tls_verdict_apply
> > > is calling sk_psock_skb_redirect(). Why not have a sk_psock_unix_redirect()
> > > and a sk_psock_udp_redirect(). There are likely some optimizations we can
> > > deploy this way. We've already don this for tls and sk_msg types for example.
> > >
> > > Then the helpers will know their types by program type, just use the right
> > > variants.
> > >
> > > So not suggestiong if/else the checks so much as having per type hooks.
> > >
> >
> > Hmm, but sk_psock_backlog() is still the only one that handles all three
> > above cases, right? It uses TCP_SKB_CB() too and more importantly it
> > is also why we can't use a per-cpu struct here (see bpf_redirect_info).
>
> Right, but the workqueue is created at init time where we will know the
> socket type based on the program/map types so can build the redirect
> backlog queue there based on the type needed. I also have a patch in
Hmm? How could a socket type match the skb type when we redirect
across-protocol?
In my use case, I want to redirect an AF_UNIX skb to a UDP socket,
clearly checking the UDP socket workqueue can't find out it is an
AF_UNIX skb. It has to be a per-skb check.
> mind that would do more specific TCP things in that code anyways. I
> can flush it out this week if anyone cares. The idea is we are wasting
> lots of cycles using skb_send_sock_locked when we can just inject
> the packet directlyy into the tcp stack.
Actually I did try the same, it clearly doesn't work for cross-protocol.
Anyway, please let me know what your suggestion for skb ext here?
It looks like we either have some ugly packet type checks, or just
use the skb ext. I don't see any other way yet, I also explored the
struct sk_buff again and still can not find anything we can reuse.
(_skb_refdst can only be reused very briefly with
tcp_skb_tsorted_save().)
Therefore, I believe using skb ext is still the best solution here.
>
> Also on the original patch here, we can't just kfree_skb() on alloc
> errors because that will look like a data drop. Errors need to be
> handled gracefully without dropping data. At least in the TCP case,
> but probably also in UDP and AF_UNIX cases as well. Original code
> was pretty loose in this regard, but it caused users to write bug
> reports and then I've been fixing most of them. If you see more
> cases let me know.
What's your suggestion here? Return -EAGAIN? But it requires
the caller put it in a loop to be graceful, but we can't do it in, for
example, sk_psock_tls_strp_read().
Thanks.
Powered by blists - more mailing lists