[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200819090756.07f76eb9@carbon>
Date: Wed, 19 Aug 2020 09:07:56 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org,
Thomas Ptacek <thomas@...kpuppet.org>,
Adhipati Blambangan <adhipati@...a.io>,
David Ahern <dsahern@...il.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Jakub Kicinski <kuba@...nel.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
"David S . Miller" <davem@...emloft.net>, brouer@...hat.com
Subject: Re: [PATCH net v6] net: xdp: account for layer 3 packets in generic
skb handler
On Sat, 15 Aug 2020 09:41:02 +0200
"Jason A. Donenfeld" <Jason@...c4.com> wrote:
> A user reported that packets from wireguard were possibly ignored by XDP
> [1]. Another user reported that modifying packets from layer 3
> interfaces results in impossible to diagnose drops.
>
> Apparently, the generic skb xdp handler path seems to assume that
> packets will always have an ethernet header, which really isn't always
> the case for layer 3 packets, which are produced by multiple drivers.
> This patch fixes the oversight. If the mac_len is 0 and so is
> hard_header_len, then we know that the skb is a layer 3 packet, and in
> that case prepend a pseudo ethhdr to the packet whose h_proto is copied
> from skb->protocol, which will have the appropriate v4 or v6 ethertype.
> This allows us to keep XDP programs' assumption correct about packets
> always having that ethernet header, so that existing code doesn't break,
> while still allowing layer 3 devices to use the generic XDP handler.
>
> We push on the ethernet header and then pull it right off and set
> mac_len to the ethernet header size, so that the rest of the XDP code
> does not need any changes. That is, it makes it so that the skb has its
> ethernet header just before the data pointer, of size ETH_HLEN.
>
> Previous discussions have included the point that maybe XDP should just
> be intentionally broken on layer 3 interfaces, by design, and that layer
> 3 people should be using cls_bpf. However, I think there are good
> grounds to reconsider this perspective:
>
> - Complicated deployments wind up applying XDP modifications to a
> variety of different devices on a given host, some of which are using
> specialized ethernet cards and other ones using virtual layer 3
> interfaces, such as WireGuard. Being able to apply one codebase to
> each of these winds up being essential.
>
> - cls_bpf does not support the same feature set as XDP, and operates at
> a slightly different stage in the networking stack. You may reply,
> "then add all the features you want to cls_bpf", but that seems to be
> missing the point, and would still result in there being two ways to
> do everything, which is not desirable for anyone actually _using_ this
> code.
>
> - While XDP was originally made for hardware offloading, and while many
> look disdainfully upon the generic mode, it nevertheless remains a
> highly useful and popular way of adding bespoke packet
> transformations, and from that perspective, a difference between layer
> 2 and layer 3 packets is immaterial if the user is primarily concerned
> with transformations to layer 3 and beyond.
>
> - It's not impossible to imagine layer 3 hardware (e.g. a WireGuard PCIe
> card) including eBPF/XDP functionality built-in. In that case, why
> limit XDP as a technology to only layer 2? Then, having generic XDP
> work for layer 3 would naturally fit as well.
>
> [1] https://lore.kernel.org/wireguard/M5WzVK5--3-2@tuta.io/
>
> Reported-by: Thomas Ptacek <thomas@...kpuppet.org>
> Reported-by: Adhipati Blambangan <adhipati@...a.io>
> Cc: David Ahern <dsahern@...il.com>
> Cc: Toke Høiland-Jørgensen <toke@...hat.com>
> Cc: Jakub Kicinski <kuba@...nel.org>
> Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>
> Cc: Jesper Dangaard Brouer <brouer@...hat.com>
> Cc: John Fastabend <john.fastabend@...il.com>
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: David S. Miller <davem@...emloft.net>
> Signed-off-by: Jason A. Donenfeld <Jason@...c4.com>
> ---
>
> I had originally dropped this patch, but the issue kept coming up in
> user reports, so here's a v4 of it. Testing of it is still rather slim,
> but hopefully that will change in the coming days.
>
> Changes v5->v6:
> - The fix to the skb->protocol changing case is now in a separate
> stand-alone patch, and removed from this one, so that it can be
> evaluated separately.
>
> Changes v4->v5:
> - Rather than tracking in a messy manner whether the skb is l3, we just
> do the check once, and then adjust the skb geometry to be identical to
> the l2 case. This simplifies the code quite a bit.
> - Fix a preexisting bug where the l2 header remained attached if
> skb->protocol was updated.
>
> Changes v3->v4:
> - We now preserve the same logic for XDP_TX/XDP_REDIRECT as before.
> - hard_header_len is checked in addition to mac_len.
>
> net/core/dev.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 151f1651439f..79c15f4244e6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4630,6 +4630,18 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> * header.
> */
> mac_len = skb->data - skb_mac_header(skb);
> + if (!mac_len && !skb->dev->hard_header_len) {
> + /* For l3 packets, we push on a fake mac header, and then
> + * pull it off again, so that it has the same skb geometry
> + * as for the l2 case.
> + */
> + eth = skb_push(skb, ETH_HLEN);
> + eth_zero_addr(eth->h_source);
> + eth_zero_addr(eth->h_dest);
> + eth->h_proto = skb->protocol;
> + __skb_pull(skb, ETH_HLEN);
> + mac_len = ETH_HLEN;
> + }
You are consuming a little bit of the headroom here.
> hlen = skb_headlen(skb) + mac_len;
> xdp->data = skb->data - mac_len;
> xdp->data_meta = xdp->data;
The XDP-prog is allowed to change eth->h_proto. Later (in code) we
detect this and update skb->protocol with the new protocol.
What will happen if my XDP-prog adds a VLAN header?
The selftest tools/testing/selftests/bpf/test_xdp_vlan.sh test these
situations. You can use it as an example, and write/construct a test
that does the same for your Wireguard devices. As minimum you need to
provide such a selftest together with this patch.
Generally speaking, IMHO generic-XDP was a mistake, because it is hard
to maintain and slows down the development of XDP. (I have a number of
fixes in my TODO backlog for generic-XDP). Adding this will just give
us more corner-cases that need to be maintained.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists