netdev - Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201013233734.6z3uyrlr43s7etay@ast-mbp>
Date:   Tue, 13 Oct 2020 16:37:34 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Jesper Dangaard Brouer <brouer@...hat.com>,
        John Fastabend <john.fastabend@...il.com>, bpf@...r.kernel.org,
        netdev@...r.kernel.org, Daniel Borkmann <borkmann@...earbox.net>,
        maze@...gle.com, lmb@...udflare.com, shaun@...era.io,
        Lorenzo Bianconi <lorenzo@...nel.org>, marek@...udflare.com,
        eyal.birger@...il.com
Subject: Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

On Tue, Oct 13, 2020 at 04:07:26PM -0700, Jakub Kicinski wrote:
> On Tue, 13 Oct 2020 22:40:09 +0200 Jesper Dangaard Brouer wrote:
> > > FWIW I took a quick swing at testing it with the HW I have and it did
> > > exactly what hardware should do. The TX unit entered an error state 
> > > and then the driver detected that and reset it a few seconds later.  
> > 
> > The drivers (i40e, mlx5, ixgbe) I tested with didn't entered an error
> > state, when getting packets exceeding the MTU.  I didn't go much above
> > 4K, so maybe I didn't trigger those cases.
> 
> You probably need to go above 16k to get out of the acceptable jumbo
> frame size. I tested ixgbe by converting TSO frames to large TCP frames,
> at low probability.

how about we set __bpf_skb_max_len() to jumbo like 8k and be done with it.

I guess some badly written driver/fw may still hang with <= 8k skb
that bpf redirected from one netdev with mtu=jumbo to another
netdev with mtu=1500, but then it's really a job of the driver/fw
to deal with it cleanly.

I think checking skb->tx_dev->mtu for every xmited packet is not great.
For typical load balancer it would be good to have MRU 1500 and MTU 15xx.
Especially if it's internet facing. Just to drop all known big
packets in hw via MRU check.
But the stack doesn't have MRU vs MTU distinction and XDP_TX doesn't
adhere to MTU. xdp_data_hard_end is the limit.
So xdp already allows growing the packet beyond MTU.
I think upgrading artificial limit in __bpf_skb_max_len() to 8k will
keep it safe enough for all practical cases and will avoid unnecessary
checks and complexity in xmit path.