netdev - Re: [PATCH v3 bpf-next 6/9] bpf: tcp: Allow bpf prog to write and parse TCP header option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200731175913.v4r2qjcvflehtyii@kafai-mbp.dhcp.thefacebook.com>
Date:   Fri, 31 Jul 2020 10:59:13 -0700
From:   Martin KaFai Lau <kafai@...com>
To:     Eric Dumazet <edumazet@...gle.com>
CC:     bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        kernel-team <kernel-team@...com>,
        Lawrence Brakmo <brakmo@...com>,
        Neal Cardwell <ncardwell@...gle.com>,
        netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH v3 bpf-next 6/9] bpf: tcp: Allow bpf prog to write and
 parse TCP header option

On Fri, Jul 31, 2020 at 09:06:57AM -0700, Eric Dumazet wrote:
> On Thu, Jul 30, 2020 at 1:57 PM Martin KaFai Lau <kafai@...com> wrote:
> >
> > The earlier effort in BPF-TCP-CC allows the TCP Congestion Control
> > algorithm to be written in BPF.  It opens up opportunities to allow
> > a faster turnaround time in testing/releasing new congestion control
> > ideas to production environment.
> >
> > The same flexibility can be extended to writing TCP header option.
> > It is not uncommon that people want to test new TCP header option
> > to improve the TCP performance.  Another use case is for data-center
> > that has a more controlled environment and has more flexibility in
> > putting header options for internal only use.
> >
> > For example, we want to test the idea in putting maximum delay
> > ACK in TCP header option which is similar to a draft RFC proposal [1].
> >
> > This patch introduces the necessary BPF API and use them in the
> > TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse
> > and write TCP header options.  It currently supports most of
> > the TCP packet except RST.
> >
> > Supported TCP header option:
> > ───────────────────────────
> > This patch allows the bpf-prog to write any option kind.
> > Different bpf-progs can write its own option by calling the new helper
> > bpf_store_hdr_opt().  The helper will ensure there is no duplicated
> > option in the header.
> >
> > By allowing bpf-prog to write any option kind, this gives a lot of
> > flexibility to the bpf-prog.  Different bpf-prog can write its
> > own option kind.  It could also allow the bpf-prog to support a
> > recently standardized option on an older kernel.
> >
> > Sockops Callback Flags:
> > ──────────────────────
> > The header parsing and writing callback can be turned on
> > by enabling a few newly added callback flags:
> >
> > BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG:
> >         Call bpf when kernel has received a header option that
> >         the kernel cannot handle.  It is useful when the peer doesn't
> >         send bpf-options very often.
> >
> >         The bpf-prog can inspect the received header by sock_ops->skb_data
> >         which covers the whole header (including the fixed fields like
> >         ports, flags...etc) or
> >         use the new bpf_load_hdr_opt() to search for a particular TCP
> >         header option.
> >
> >
> >
> >
> 
> > [1]: draft-wang-tcpm-low-latency-opt-00
> >      https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dwang-2Dtcpm-2Dlow-2Dlatency-2Dopt-2D00&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=Z-syoz304fodO8xPKCcJh0QYhXbb7_XVuRgTINFba2U&s=Ad66Zb5r0utWgnrB-QuDXBft6G1HXW2C_aBV9fTMxoo&e= 
> >
> > Signed-off-by: Martin KaFai Lau <kafai@...com>
> > ---
> >  include/linux/bpf-cgroup.h     |  25 +++
> >  include/linux/filter.h         |   4 +
> >  include/net/tcp.h              |  53 ++++-
> >  include/uapi/linux/bpf.h       | 231 ++++++++++++++++++++-
> >  net/core/filter.c              | 365 +++++++++++++++++++++++++++++++++
> >  net/ipv4/tcp_fastopen.c        |   2 +-
> >  net/ipv4/tcp_input.c           |  86 +++++++-
> >  net/ipv4/tcp_ipv4.c            |   3 +-
> >  net/ipv4/tcp_minisocks.c       |   1 +
> >  net/ipv4/tcp_output.c          | 194 ++++++++++++++++--
> >  net/ipv6/tcp_ipv6.c            |   3 +-
> >  tools/include/uapi/linux/bpf.h | 231 ++++++++++++++++++++-
> >  12 files changed, 1171 insertions(+), 27 deletions(-)
> 
> This is a truly gigantic patch.
> 
> Could you split it in maybe two parts ?
Yes.

Most of the code changes in TCP are calling out the bpf prog to parse and
write header.  Thus, they are all in this one patch.

I will put those callout changes (and a few func arg changes) in TCP
to a separate patch but leave the bpf callout function empty.

Then the next bpf specific patch will fill out those empty bpf
callout functions.

> 
> This way I could focus on the TCP changes, and let eBPF experts focus
> on BPF changes.
Thanks for the review!