netdev - Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sat, 10 Oct 2020 09:32:12 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     John Fastabend <john.fastabend@...il.com>, bpf@...r.kernel.org,
        netdev@...r.kernel.org, Daniel Borkmann <borkmann@...earbox.net>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        maze@...gle.com, lmb@...udflare.com, shaun@...era.io,
        Lorenzo Bianconi <lorenzo@...nel.org>, marek@...udflare.com,
        eyal.birger@...il.com
Subject: Re: [PATCH bpf-next V3 0/6] bpf: New approach for BPF MTU handling

On Sat, 10 Oct 2020 12:44:02 +0200 Jesper Dangaard Brouer wrote:
> > > > We will not be sprinkling validation checks across the drivers because
> > > > some reconfiguration path may occasionally yield a bad packet, or it's
> > > > hard to do something right with BPF.      
> > > 
> > > This is a driver bug then. As it stands today drivers may get hit with
> > > skb with MTU greater than set MTU as best I can tell.    
> > 
> > You're talking about taking it from "maybe this can happen, but will
> > still be at most jumbo" to "it's going to be very easy to trigger and
> > length may be > MAX_U16".  
> 
> It is interesting that a misbehaving BPF program can easily trigger this.
> Next week, I will looking writing such a BPF-prog and then test it on
> the hardware I have avail in my testlab.

FWIW I took a quick swing at testing it with the HW I have and it did
exactly what hardware should do. The TX unit entered an error state 
and then the driver detected that and reset it a few seconds later.

Hardware is almost always designed to behave like that. If some NIC
actually cleanly drops over sized TX frames, I'd bet it's done in FW,
or some other software piece.

There was also a statement earlier in the thread that we can put a large
frame on the wire and "let the switch drop it". I don't believe
that's possible either (as I mentioned previously BPF could generate
frames above jumbo size). My phy knowledge is very rudimentary and
rusty but from what I heard Ethernet PHYs have a hard design limit on
the length of a frame they can put of a wire (or pull from it), because
of symbol encoding, electrical charges on the wire etc. reasons. There
needs to be a bunch of idle symbols every now and then. And obviously
if one actually manages to get a longer frame to the PHY it will fault,
see above.

> > > Generally I expect drivers use MTU to configure RX buffers not sure
> > > how it is going to be used on TX side? Any examples? I just poked
> > > around through the driver source to see and seems to confirm its
> > > primarily for RX side configuration with some drivers throwing the
> > > event down to the firmware for something that I can't see in the code?    
> > 
> > Right, but that could just be because nobody expects to get over sized
> > frames from the stack.
> > 
> > We actively encourage drivers to remove paranoid checks. It's really
> > not going to be a great experience for driver authors where they need
> > to consult a list of things they should and shouldn't check.
> > 
> > If we want to do this, the driver interface must most definitely say 
> > MRU and not MTU.  
> 
> What is MRU?

Max Receive Unit, Jesse and others have been talking about how we 
should separate the TX config from RX config for drivers. Right now
drivers configure RX filters based on the max transmission unit, 
which is weird, and nobody is sure whether that's actually desired.

> > > I'm not suggestiong sprinkling validation checks across the drivers.
> > > I'm suggesting if the drivers hang we fix them.    
> > 
> > We both know the level of testing drivers get, it's unlikely this will
> > be validated. It's packet of death waiting to happen. 
> > 
> > And all this for what? Saving 2 cycles on a branch that will almost
> > never be taken?  
> 
> I do think it is risky not to do this simple MTU check in net-core.  I
> also believe the overhead is very very low.  Hint, I'm basically just
> moving the MTU check from one place to another.  (And last patch in
> patchset is an optimization that inlines and save cycles when doing
> these kind of MTU checks).