linux-kernel - Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <451b291a-7798-cfe2-84da-815937b54f70@iogearbox.net>
Date:   Thu, 15 Dec 2022 23:31:52 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Daniel Xu <dxu@...uu.xyz>, "David S. Miller" <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>
Cc:     ppenkov@...atrix.com, dbird@...atrix.com,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag()
 kfunc

Hi Daniel,

Thanks for working on this!

On 12/15/22 12:25 AM, Daniel Xu wrote:
[...]
> +#include <linux/bpf.h>
> +#include <linux/btf_ids.h>
> +#include <linux/ip.h>
> +#include <linux/filter.h>
> +#include <linux/netdevice.h>
> +#include <net/ip.h>
> +#include <net/sock.h>
> +
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in ip_fragment BTF");
> +
> +/* bpf_ip_check_defrag - Defragment an ipv4 packet
> + *
> + * This helper takes an skb as input. If this skb successfully reassembles
> + * the original packet, the skb is updated to contain the original, reassembled
> + * packet.
> + *
> + * Otherwise (on error or incomplete reassembly), the input skb remains
> + * unmodified.
> + *
> + * Parameters:
> + * @ctx		- Pointer to program context (skb)
> + * @netns	- Child network namespace id. If value is a negative signed
> + *		  32-bit integer, the netns of the device in the skb is used.
> + *
> + * Return:
> + * 0 on successfully reassembly or non-fragmented packet. Negative value on
> + * error or incomplete reassembly.
> + */
> +int bpf_ip_check_defrag(struct __sk_buff *ctx, u64 netns)

small nit: for sk lookup helper we've used u32 netns_id, would be nice to have
this consistent here as well.

> +{
> +	struct sk_buff *skb = (struct sk_buff *)ctx;
> +	struct sk_buff *skb_cpy, *skb_out;
> +	struct net *caller_net;
> +	struct net *net;
> +	int mac_len;
> +	void *mac;
> +
> +	if (unlikely(!((s32)netns < 0 || netns <= S32_MAX)))
> +		return -EINVAL;
> +
> +	caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk);
> +	if ((s32)netns < 0) {
> +		net = caller_net;
> +	} else {
> +		net = get_net_ns_by_id(caller_net, netns);
> +		if (unlikely(!net))
> +			return -EINVAL;
> +	}
> +
> +	mac_len = skb->mac_len;
> +	skb_cpy = skb_copy(skb, GFP_ATOMIC);
> +	if (!skb_cpy)
> +		return -ENOMEM;

Given slow path, this idea is expensive but okay. Maybe in future it could be lifted
which might be a bigger lift to teach verifier that input ctx cannot be accessed
anymore.. but then frags are very much discouraged either way and bpf_ip_check_defrag()
might only apply in corner case situations (like DNS, etc).

> +	skb_out = ip_check_defrag(net, skb_cpy, IP_DEFRAG_BPF);
> +	if (IS_ERR(skb_out))
> +		return PTR_ERR(skb_out);

Looks like ip_check_defrag() can gracefully handle IPv6 packet. It will just return back
skb_cpy pointer in that case. However, this brings me to my main complaint.. I don't
think we should merge anything IPv4-related without also having IPv6 equivalent support,
otherwise we're building up tech debt, so pls also add support for the latter.

> +	skb_morph(skb, skb_out);
> +	kfree_skb(skb_out);
> +
> +	/* ip_check_defrag() does not maintain mac header, so push empty header
> +	 * in so prog sees the correct layout. The empty mac header will be
> +	 * later pulled from cls_bpf.
> +	 */
> +	mac = skb_push(skb, mac_len);
> +	memset(mac, 0, mac_len);
> +	bpf_compute_data_pointers(skb);
> +
> +	return 0;
> +}
> +

Thanks,
Daniel