netdev - Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANP3RGdPWxf-TFypoxb0uMtKBy2nVAkzCmbhwjzjKDi8ZguEbg@mail.gmail.com>
Date:   Tue, 13 Oct 2020 16:54:18 -0700
From:   Maciej Żenczykowski <maze@...gle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        John Fastabend <john.fastabend@...il.com>,
        bpf <bpf@...r.kernel.org>, Linux NetDev <netdev@...r.kernel.org>,
        Daniel Borkmann <borkmann@...earbox.net>,
        Lorenz Bauer <lmb@...udflare.com>,
        Shaun Crampton <shaun@...era.io>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Marek Majkowski <marek@...udflare.com>,
        Eyal Birger <eyal.birger@...il.com>
Subject: Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

> how about we set __bpf_skb_max_len() to jumbo like 8k and be done with it.

8k is still far too small.  A lot of places do 9K or 16K jumbo frames.
You'd need at least a full 16K for it to be real jumbo compatible.

That said, if we're ever willing to ignore device mtu, then I see no
reason why an 8K or 16K or 32K limit is any better than 64K.
(which is at least max IP packet size compatible [let's ignore ipv6
jumbograms as not realistic])

If something in the firmware/driver fails at 64K it'll probably fail
at 8K as well.
Since the 'bad' hardware is most likely old and only ~1500 (or 1
pagesize) capable anyway...

In practice driver limitations maybe more around the number of pages
or sg sections, then rather on the max packet size anyway...
so failures may depend on individual skb layout...

And as a reminder there are interfaces (like lo) that default to 64K mtu.
(and I have veth setups with 64K mtu as well)

btw. our GCE folks tell us they occasionally see (and now discard)
>mtu packets from Linux VMs (using the virtio-net driver),
we've not had time to debug this (the VMs in question have some pretty
funky routing and for privacy reason I've not been able to get actual
dumps of the problematic frames), but gut feeling is >mtu packets
occasionally leak into the drivers (probably from the tcp stack).

> I guess some badly written driver/fw may still hang with <= 8k skb
> that bpf redirected from one netdev with mtu=jumbo to another
> netdev with mtu=1500, but then it's really a job of the driver/fw
> to deal with it cleanly.
>
> I think checking skb->tx_dev->mtu for every xmited packet is not great.
> For typical load balancer it would be good to have MRU 1500 and MTU 15xx.
> Especially if it's internet facing. Just to drop all known big
> packets in hw via MRU check.
> But the stack doesn't have MRU vs MTU distinction and XDP_TX doesn't
> adhere to MTU. xdp_data_hard_end is the limit.
> So xdp already allows growing the packet beyond MTU.
> I think upgrading artificial limit in __bpf_skb_max_len() to 8k will
> keep it safe enough for all practical cases and will avoid unnecessary
> checks and complexity in xmit path.