[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160711133632.483bf2cb@redhat.com>
Date: Mon, 11 Jul 2016 13:36:32 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Tom Herbert <tom@...bertland.com>
Cc: Brenden Blanco <bblanco@...mgrid.com>,
"David S. Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>, Ari Saha <as754m@....com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Or Gerlitz <gerlitz.or@...il.com>,
john fastabend <john.fastabend@...il.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Thomas Graf <tgraf@...g.ch>,
Daniel Borkmann <daniel@...earbox.net>, brouer@...hat.com
Subject: Re: [PATCH v6 01/12] bpf: add XDP prog type for early driver filter
On Sun, 10 Jul 2016 15:27:38 -0500
Tom Herbert <tom@...bertland.com> wrote:
> On Sun, Jul 10, 2016 at 8:37 AM, Jesper Dangaard Brouer
> <brouer@...hat.com> wrote:
> > On Sat, 9 Jul 2016 08:47:52 -0500
> > Tom Herbert <tom@...bertland.com> wrote:
> >
> >> On Sat, Jul 9, 2016 at 3:14 AM, Jesper Dangaard Brouer
> >> <brouer@...hat.com> wrote:
> >> > On Thu, 7 Jul 2016 19:15:13 -0700
> >> > Brenden Blanco <bblanco@...mgrid.com> wrote:
> >> >
> >> >> Add a new bpf prog type that is intended to run in early stages of the
> >> >> packet rx path. Only minimal packet metadata will be available, hence a
> >> >> new context type, struct xdp_md, is exposed to userspace. So far only
> >> >> expose the packet start and end pointers, and only in read mode.
> >> >>
> >> >> An XDP program must return one of the well known enum values, all other
> >> >> return codes are reserved for future use. Unfortunately, this
> >> >> restriction is hard to enforce at verification time, so take the
> >> >> approach of warning at runtime when such programs are encountered. The
> >> >> driver can choose to implement unknown return codes however it wants,
> >> >> but must invoke the warning helper with the action value.
> >> >
> >> > I believe we should define a stronger semantics for unknown/future
> >> > return codes than the once stated above:
> >> > "driver can choose to implement unknown return codes however it wants"
> >> >
> >> > The mlx4 driver implementation in:
> >> > [PATCH v6 04/12] net/mlx4_en: add support for fast rx drop bpf program
> >> >
> >> > On Thu, 7 Jul 2016 19:15:16 -0700 Brenden Blanco <bblanco@...mgrid.com> wrote:
> >> >
> >> >> + /* A bpf program gets first chance to drop the packet. It may
> >> >> + * read bytes but not past the end of the frag.
> >> >> + */
> >> >> + if (prog) {
> >> >> + struct xdp_buff xdp;
> >> >> + dma_addr_t dma;
> >> >> + u32 act;
> >> >> +
> >> >> + dma = be64_to_cpu(rx_desc->data[0].addr);
> >> >> + dma_sync_single_for_cpu(priv->ddev, dma,
> >> >> + priv->frag_info[0].frag_size,
> >> >> + DMA_FROM_DEVICE);
> >> >> +
> >> >> + xdp.data = page_address(frags[0].page) +
> >> >> + frags[0].page_offset;
> >> >> + xdp.data_end = xdp.data + length;
> >> >> +
> >> >> + act = bpf_prog_run_xdp(prog, &xdp);
> >> >> + switch (act) {
> >> >> + case XDP_PASS:
> >> >> + break;
> >> >> + default:
> >> >> + bpf_warn_invalid_xdp_action(act);
> >> >> + case XDP_DROP:
> >> >> + goto next;
> >> >> + }
> >> >> + }
> >> >
> >> > Thus, mlx4 choice is to drop packets for unknown/future return codes.
> >> >
> >> > I think this is the wrong choice. I think the choice should be
> >> > XDP_PASS, to pass the packet up the stack.
> >> >
> >> > I find "XDP_DROP" problematic because it happen so early in the driver,
> >> > that we lost all possibilities to debug what packets gets dropped. We
> >> > get a single kernel log warning, but we cannot inspect the packets any
> >> > longer. By defaulting to XDP_PASS all the normal stack tools (e.g.
> >> > tcpdump) is available.
> >> >
> >>
> >> It's an API issue though not a problem with the packet. Allowing
> >> unknown return codes to pass seems like a major security problem also.
> >
> > We have the full power and flexibility of the normal Linux stack to
> > drop these packets. And from a usability perspective it gives insight
> > into what is wrong and counters metrics. Would you rather blindly drop
> > e.g. 0.01% of the packets in your data-centers without knowing.
> >
> This is not blindly dropping packets; the bad action should be logged,
> counters incremented, and packet could be passed to the stack as an
> error if deeper inspection is needed.
Well, the patch only logs a single warning. There is no method of
counting or passing to the stack in this proposal. And adding such
things is a performance regression risk, and DoS vector in itself.
> IMO, I would rather drop
> something not understood than accept it-- determinism is a goal also.
>
> > We already talk about XDP as an offload mechanism. Normally when
> > loading a (XDP) "offload" program it should be rejected, e.g. by the
> > validator. BUT we cannot validate all return eBPF codes, because they
> > can originate from a table lookup. Thus, we _do_ allow programs to be
> > loaded, with future unknown return code.
> > This then corresponds to only part of the program can be offloaded,
> > thus the natural response is to fallback, handling this is the
> > non-offloaded slower-path.
> >
> > I see the XDP_PASS fallback as a natural way of supporting loading
> > newer/future programs on older "versions" of XDP.
>
> Then in this model we could only add codes that allow passing packets.
> For instance, what if a new return code means "Drop this packet and
> log it as critical because if you receive it the stack will crash"?
Drop is drop. I don't see how we would need to drop in a "new" way.
If you need to log a critical event do it in the eBPF program.
> ;-) IMO ignoring something not understood for the sake of
> extensibility is a red herring. In the long run doing this actually
> limits are ability to extend things for both APIs and protocols (a
> great example of this is VLXAN that mandates unknown flags are
> ignored in RX so VXLAN-GPE has a be a new incompatible protocol to get
> a next protocol field).
>
> > E.g. I can have a XDP program that have a valid filter protection
> > mechanism, but also use a newer mechanism, and my server fleet contains
> > different NIC vendors, some NICs only support the filter part. Then I
> > want to avoid having to compile and maintain different XDP/eBPF
> > programs per NIC vendor. (Instead I prefer having a Linux stack
> > fallback mechanism, and transparently XDP offload as much as the NIC
> > driver supports).
> >
> As Brenden points out, fallbacks easily become DOS vectors.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists