netdev - Re: How to limit TCP packet lengths given to TC egress EBPF programs?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAADnVQJ9M6ip6uYb9ky=eH-Z1BO-cTeGOpYs0M3EZrgURWpNcQ@mail.gmail.com>
Date:   Tue, 13 Jul 2021 16:51:55 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     "Fingerhut, John Andy" <john.andy.fingerhut@...el.com>,
        bpf <bpf@...r.kernel.org>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Petr Lapukhov <petr@...com>,
        Sandesh Dhawaskar Sathyanarayana 
        <Sandesh.DhawaskarSathyanarayana@...orado.edu>,
        Daniel Borkmann <daniel@...earbox.net>,
        Toke Høiland-Jørgensen <toke@...hat.com>
Subject: Re: How to limit TCP packet lengths given to TC egress EBPF programs?

On Fri, Jul 9, 2021 at 11:40 AM Fingerhut, John Andy
<john.andy.fingerhut@...el.com> wrote:
>
> Greetings:
>
> I am working on a project that runs an EBPF program on the Linux
> Traffic Control egress hook, which modifies selected packets to add
> headers to them that we use for some network telemetry.
>
> I know that this is _not_ what one wants to do to get maximum TCP
> performance, but at least for development purposes I was hoping to
> find a way to limit the length of all TCP packets that are processed
> by this EBPF program to be at most one MTU.
>
> Towards that goal, we have tried several things, but regardless of
> which subset of the following things we have tried, there are some
> packets processed by our EBPF program that have IPv4 Total Length
> field that is some multiple of the MSS size, sometimes nearly 64
> KBytes.  If it makes a difference in configuration options available,
> we have primarily been testing with Ubuntu 20.04 Linux running the
> Linux kernel versions near 5.8.0-50-generic distributed by Canonical.
>
> Disable TSO and GSO on the network interface:
>
>     ethtool -K enp0s8 tso off gso off
>
> Configuring TCP MSS using 'ip route' command:
>
>     ip route change 10.0.3.0/24 dev enp0s8 advmss 1424
>
> The last command _does_ have some effect, in that many packets
> processed by our EBPF program have a length affected by that advmss
> value, but we still see many packets that are about twice as large,
> about three times as large, etc., which fit into that MSS after being
> segmented, I believe in the kernel GSO code.
>
> Is there some other configuration option we can change that can
> guarantee that when a TCP packet is given to a TC egress EBPF program,
> it will always be at most a specified length?
>
>
> Background:
>
> Intel is developing and releasing some open source EBPF programs and
> associated user space programs that modify packets to add INT (Inband
> Network Telemetry) headers, which can be used for some kinds of
> performance debugging reasons, e.g. triggering events when packet
> losses are detected, or significant changes in one-way packet latency
> between two hosts configured to run this Host INT code.  See the
> project home page for more details if you are interested:
>
> https://github.com/intel/host-int

I suspect MTU/MSS issue is only the tip of the iceberg.

https://github.com/intel/host-int/blob/main/docs/Host_INT_fmt.md
That's an interesting design !
Few things should be probably be addressed sooner than later:
"Host INT currently only supports adding INT headers to IPv4 packets."
To consider such a feature of Tofino switches IPv6 has to be supported.
That shouldn't be hard to do, right?

https://github.com/intel/host-int/blob/main/docs/host-int-project.pptx
That's a lot of bpf programs :)
Looks like in the bridge case (last slide) every incoming packet will
be processed
by two XDP programs.
XDP is certainly fast, but it still adds overhead.
Not every packet will have such INT header so most of the packets will be
passing through XDP prog into the stack or from stack through TC egress program.
Such XDP ingress and TC egress progs will add overhead that might be
unacceptable in production deployment.
Have you considered using the new TCP header option instead?
https://lore.kernel.org/bpf/CAADnVQJ21Tt2HaJ5P4wbxBLVo1YT-PwN3bOHBQK+17reK5HxOg@mail.gmail.com/
BPF prog can conditionally add it for few packets/flows and another BPF prog
on receive side will process such header option.
While Tofino switch will find packets with a special TCP header and fill them in
with telemetry data.
"INT report packets are sent as UDP datagrams" part of the design can stay.
Looks like you're reserving a UDP port for such a purpose, so no need
for the receive side to have an XDP program to process every packet.

With TCP header option approach the MTU issue will go away as well.

> Note: The code published now is an alpha release.  We know there are
> bugs.  We know our development team is not what you would call EBPF
> experts (at least not yet), so feel free to point out bugs and/or
> anything that code is doing that might be a bad idea.

Thank you for reaching out. We're here to help with your BPF/XDP needs :)

> Thanks,
> Andy Fingerhut
> Principal Engineer
> Intel Corporation