[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <451b291a-7798-cfe2-84da-815937b54f70@iogearbox.net>
Date: Thu, 15 Dec 2022 23:31:52 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Daniel Xu <dxu@...uu.xyz>, "David S. Miller" <davem@...emloft.net>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
David Ahern <dsahern@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Cc: ppenkov@...atrix.com, dbird@...atrix.com,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag()
kfunc
Hi Daniel,
Thanks for working on this!
On 12/15/22 12:25 AM, Daniel Xu wrote:
[...]
> +#include <linux/bpf.h>
> +#include <linux/btf_ids.h>
> +#include <linux/ip.h>
> +#include <linux/filter.h>
> +#include <linux/netdevice.h>
> +#include <net/ip.h>
> +#include <net/sock.h>
> +
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> + "Global functions as their definitions will be in ip_fragment BTF");
> +
> +/* bpf_ip_check_defrag - Defragment an ipv4 packet
> + *
> + * This helper takes an skb as input. If this skb successfully reassembles
> + * the original packet, the skb is updated to contain the original, reassembled
> + * packet.
> + *
> + * Otherwise (on error or incomplete reassembly), the input skb remains
> + * unmodified.
> + *
> + * Parameters:
> + * @ctx - Pointer to program context (skb)
> + * @netns - Child network namespace id. If value is a negative signed
> + * 32-bit integer, the netns of the device in the skb is used.
> + *
> + * Return:
> + * 0 on successfully reassembly or non-fragmented packet. Negative value on
> + * error or incomplete reassembly.
> + */
> +int bpf_ip_check_defrag(struct __sk_buff *ctx, u64 netns)
small nit: for sk lookup helper we've used u32 netns_id, would be nice to have
this consistent here as well.
> +{
> + struct sk_buff *skb = (struct sk_buff *)ctx;
> + struct sk_buff *skb_cpy, *skb_out;
> + struct net *caller_net;
> + struct net *net;
> + int mac_len;
> + void *mac;
> +
> + if (unlikely(!((s32)netns < 0 || netns <= S32_MAX)))
> + return -EINVAL;
> +
> + caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
> + if ((s32)netns < 0) {
> + net = caller_net;
> + } else {
> + net = get_net_ns_by_id(caller_net, netns);
> + if (unlikely(!net))
> + return -EINVAL;
> + }
> +
> + mac_len = skb->mac_len;
> + skb_cpy = skb_copy(skb, GFP_ATOMIC);
> + if (!skb_cpy)
> + return -ENOMEM;
Given slow path, this idea is expensive but okay. Maybe in future it could be lifted
which might be a bigger lift to teach verifier that input ctx cannot be accessed
anymore.. but then frags are very much discouraged either way and bpf_ip_check_defrag()
might only apply in corner case situations (like DNS, etc).
> + skb_out = ip_check_defrag(net, skb_cpy, IP_DEFRAG_BPF);
> + if (IS_ERR(skb_out))
> + return PTR_ERR(skb_out);
Looks like ip_check_defrag() can gracefully handle IPv6 packet. It will just return back
skb_cpy pointer in that case. However, this brings me to my main complaint.. I don't
think we should merge anything IPv4-related without also having IPv6 equivalent support,
otherwise we're building up tech debt, so pls also add support for the latter.
> + skb_morph(skb, skb_out);
> + kfree_skb(skb_out);
> +
> + /* ip_check_defrag() does not maintain mac header, so push empty header
> + * in so prog sees the correct layout. The empty mac header will be
> + * later pulled from cls_bpf.
> + */
> + mac = skb_push(skb, mac_len);
> + memset(mac, 0, mac_len);
> + bpf_compute_data_pointers(skb);
> +
> + return 0;
> +}
> +
Thanks,
Daniel
Powered by blists - more mailing lists