[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <602b24ca1fdca_51bf220832@john-XPS-13-9370.notmuch>
Date: Mon, 15 Feb 2021 17:50:02 -0800
From: John Fastabend <john.fastabend@...il.com>
To: Cong Wang <xiyou.wangcong@...il.com>,
John Fastabend <john.fastabend@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, duanxiongchun@...edance.com,
Dongdong Wang <wangdongdong.6@...edance.com>,
jiang.wang@...edance.com, Cong Wang <cong.wang@...edance.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jakub Sitnicki <jakub@...udflare.com>,
Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [Patch bpf-next v3 4/5] skmsg: use skb ext instead of TCP_SKB_CB
Cong Wang wrote:
> On Mon, Feb 15, 2021 at 4:54 PM John Fastabend <john.fastabend@...il.com> wrote:
> >
> > Cong Wang wrote:
> > > On Mon, Feb 15, 2021 at 3:57 PM John Fastabend <john.fastabend@...il.com> wrote:
> > > >
> > > > For TCP case we can continue to use CB and not pay the price. For UDP
> > > > and AF_UNIX we can do the extra alloc.
> > >
> > > I see your point, but specializing TCP case does not give much benefit
> > > here, the skmsg code would have to check skb->protocol etc. to decide
> > > whether to use TCP_SKB_CB() or skb_ext:
> > >
> > > if (skb->protocol == ...)
> > > TCP_SKB_CB(skb) = ...;
> > > else
> > > ext = skb_ext_find(skb);
> > >
> > > which looks ugly to me. And I doubt skb->protocol alone is sufficient to
> > > distinguish TCP, so we may end up having more checks above.
> > >
> > > So do you really want to trade code readability with an extra alloc?
> >
> > Above is ugly. So I look at where the patch replaces things,
> >
> > sk_psock_tls_strp_read(), this is TLS specific read hook so can't really
> > work in generic case anyways.
> >
> > sk_psock_strp_read(), will you have UDP, AF_UNIX stream parsers? Do these
> > even work outside TCP cases.
> >
> > For these ones: sk_psock_verdict_apply(), sk_psock_verdict_recv(),
> > sk_psock_backlog(), can't we just do some refactoring around their
> > hook points so we know the context. For example sk_psock_tls_verdict_apply
> > is calling sk_psock_skb_redirect(). Why not have a sk_psock_unix_redirect()
> > and a sk_psock_udp_redirect(). There are likely some optimizations we can
> > deploy this way. We've already don this for tls and sk_msg types for example.
> >
> > Then the helpers will know their types by program type, just use the right
> > variants.
> >
> > So not suggestiong if/else the checks so much as having per type hooks.
> >
>
> Hmm, but sk_psock_backlog() is still the only one that handles all three
> above cases, right? It uses TCP_SKB_CB() too and more importantly it
> is also why we can't use a per-cpu struct here (see bpf_redirect_info).
Right, but the workqueue is created at init time where we will know the
socket type based on the program/map types so can build the redirect
backlog queue there based on the type needed. I also have a patch in
mind that would do more specific TCP things in that code anyways. I
can flush it out this week if anyone cares. The idea is we are wasting
lots of cycles using skb_send_sock_locked when we can just inject
the packet directlyy into the tcp stack.
Also on the original patch here, we can't just kfree_skb() on alloc
errors because that will look like a data drop. Errors need to be
handled gracefully without dropping data. At least in the TCP case,
but probably also in UDP and AF_UNIX cases as well. Original code
was pretty loose in this regard, but it caused users to write bug
reports and then I've been fixing most of them. If you see more
cases let me know.
>
> Thanks.
Powered by blists - more mailing lists