netdev - Re: [PATCH bpf-next 09/13] bpf: add bpf_skb_adjust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.20.1903201536150.18790@dhcp-10-175-198-37.vpn.oracle.com>
Date:   Wed, 20 Mar 2019 15:51:48 +0000 (GMT)
From:   Alan Maguire <alan.maguire@...cle.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
cc:     netdev@...r.kernel.org, ast@...nel.org, daniel@...earbox.net,
        sdf@...gle.com, posk@...gle.com,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH bpf-next 09/13] bpf: add bpf_skb_adjust_room encap
 flags

On Wed, 20 Mar 2019, Willem de Bruijn wrote:

> From: Willem de Bruijn <willemb@...gle.com>
> 
> When pushing tunnel headers, annotate skbs in the same way as tunnel
> devices.
>
This is great stuff Willem!
 
> For GSO packets, the network stack requires certain fields set to
> segment packets with tunnel headers. gro_gse_segment depends on
> transport and inner mac header, for instance.
>

By coincidence I've been working on a patch to solve part of
this problem (attached).

I took a slightly different approach (which I think you mentioned
you considered) - adding an additional helper to mark the inner
headers.  The reason I needed this is that the mac header length
in my case was sometimes not the same as the outer mac size (it
could be a set of MPLS labels sometimes, or indeed there might
be no mac header at all).  If I'm reading your code correctly,
you derive  the mac header length from the outer mac header size -
would there be a possibility of overloading the flags field for
bpf_skb_adjust_room to use 8 bits to store a non-standard mac length 
perhaps? I'd be happy to work on that as a separate patch if that seems 
reasonable.
 
> Add an option to pass this information.
> 
> Remove the restriction on len_diff to network header length, which
> is too short, e.g., for GRE protocols.
>

I think this solves another problem I'd observed; when de-encapsulating
packets which had been GRO re-assembled, bpf_skb_adjust_room would
fail becuase GRO reassembly set the transport header, and as
shrinkage was limited to ensure we still had an IPv4/IPv6 header's
worth of space between the network and transport headers, the operation
would fail.  I think that problem is fixed here, is that right?

Reviewed-by: Alan Maguire <alan.maguire@...cle.com>

Thanks!

Alan

>From 388c1bf0cfc76901782520c5af58f73b2649a4c0 Mon Sep 17 00:00:00 2001
From: Alan Maguire <alan.maguire@...cle.com>
Date: Wed, 20 Mar 2019 14:12:36 +0100
Subject: [PATCH] bpf: add bpf_skb_set_inner_header helper

This work adds a helper which can mark inner mac and network
headers and sets inner protocol type for the relevant skb such
that generic segmentation offload (GSO) can segment the packet
appropriately while taking newly-added encapsulation headers into
account.

It is intended to be used in conjunction with other helpers such
as bpf_skb_adjust_room for cases where custom encapsulation is
implemented in a tc BPF program and GSO functionality is needed.
Currently UDP and GRE encapsulation are supported.

Signed-off-by: Alan Maguire <alan.maguire@...cle.com>
---
 include/uapi/linux/bpf.h | 23 ++++++++++++++++-
 net/core/filter.c        | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 929c8e5..3ce3c16 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2431,6 +2431,22 @@ struct bpf_stack_build_id {
  *	Return
  *		A **struct bpf_sock** pointer on success, or **NULL** in
  *		case of failure.
+ *
+ * int bpf_skb_set_inner_header(skb, outer_proto, inner_proto, inner_offset,
+ *				flags)
+ *	Description
+ *		Set inner header at *inner_offset* for specified *inner_proto*
+ *		for tunnel of type *outer_proto*. *outer_proto* must be one
+ *		of **IPPROTO_UDP** or **IPPROTO_GRE**.  *inner_proto* must be
+ *		**ETH_P_IP** or **ETH_P_IPV6**.  *inner_offset* should specify
+ *		offset of the relevant inner header, or should be 0 to reset
+ *		inner headers. *flags* should be a combination of
+ *
+ *			* **BPF_F_L2_INNER_OFFSET** offset is inner mac header
+ *			* **BPF_F_L3_INNER_OFFSET** offset is inner network
+ *			  header
+ *	Return
+ *		0 on success, or negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2531,7 +2547,8 @@ struct bpf_stack_build_id {
 	FN(sk_fullsock),		\
 	FN(tcp_sock),			\
 	FN(skb_ecn_set_ce),		\
-	FN(get_listener_sock),
+	FN(get_listener_sock),		\
+	FN(skb_set_inner_header),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2595,6 +2612,10 @@ enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
 };
 
+/* BPF_FUNC_skb_set_inner_header flags. */
+#define	BPF_F_L2_INNER_OFFSET		(1ULL << 0)
+#define	BPF_F_L3_INNER_OFFSET		(1ULL << 1)
+
 /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
 enum bpf_hdr_start_off {
 	BPF_HDR_START_MAC,
diff --git a/net/core/filter.c b/net/core/filter.c
index 647c63a..f8265f3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5463,6 +5463,68 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
 };
 #endif /* CONFIG_INET */
 
+BPF_CALL_5(bpf_skb_set_inner_header, struct sk_buff *, skb,
+	   __be16, outer_proto, __be16, inner_proto, u32, inner_offset,
+	   u64, flags)
+{
+	unsigned int gso_type;
+
+	if (unlikely(inner_offset > skb->len))
+		return -EINVAL;
+
+	if (unlikely(flags & ~(BPF_F_L2_INNER_OFFSET | BPF_F_L3_INNER_OFFSET)))
+		return -EINVAL;
+
+	switch (outer_proto) {
+	case IPPROTO_UDP:
+		gso_type = SKB_GSO_UDP_TUNNEL;
+		break;
+	case IPPROTO_GRE:
+		gso_type = SKB_GSO_GRE;
+		break;
+	default:
+		return -ENOTSUPP;
+	}
+
+	switch (inner_proto) {
+	case ETH_P_IP:
+	case ETH_P_IPV6:
+		break;
+	default:
+		return -ENOTSUPP;
+	}
+
+	if (inner_offset == 0) {
+		skb->encapsulation = 0;
+		skb_reset_inner_headers(skb);
+		return 0;
+	}
+
+	if (flags & BPF_F_L2_INNER_OFFSET)
+		skb_set_inner_mac_header(skb, inner_offset);
+	if (flags & BPF_F_L3_INNER_OFFSET)
+		skb_set_inner_network_header(skb, inner_offset);
+
+	skb_set_inner_protocol(skb, htons(inner_proto));
+	skb->encapsulation = 1;
+
+	if (skb_is_gso(skb))
+		skb_shinfo(skb)->gso_type |= gso_type;
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_skb_set_inner_header_proto = {
+	.func		= bpf_skb_set_inner_header,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_ANYTHING,
+	.arg5_type	= ARG_ANYTHING,
+};
+
 bool bpf_helper_changes_pkt_data(void *func)
 {
 	if (func == bpf_skb_vlan_push ||
@@ -5720,6 +5782,8 @@ bool bpf_helper_changes_pkt_data(void *func)
 	case BPF_FUNC_get_listener_sock:
 		return &bpf_get_listener_sock_proto;
 #endif
+	case BPF_FUNC_skb_set_inner_header:
+		return &bpf_skb_set_inner_header_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
-- 
1.8.3.1


> Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> ---
>  include/uapi/linux/bpf.h | 14 +++++++++-
>  net/core/filter.c        | 58 +++++++++++++++++++++++++++++++++++++---
>  2 files changed, 67 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 0eda8f564a381..a444534cc88d7 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2595,7 +2595,19 @@ enum bpf_func_id {
>  
>  /* BPF_FUNC_skb_adjust_room flags. */
>  #define BPF_F_ADJ_ROOM_FIXED_GSO	(1ULL << 0)
> -#define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO)
> +
> +#define BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	(1ULL << 1)
> +#define BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	(1ULL << 2)
> +#define BPF_F_ADJ_ROOM_ENCAP_L3_MASK	(BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | \
> +					 BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
> +
> +#define BPF_F_ADJ_ROOM_ENCAP_L4_GRE	(1ULL << 3)
> +#define BPF_F_ADJ_ROOM_ENCAP_L4_UDP	(1ULL << 4)
> +
> +#define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO | \
> +					 BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
> +					 BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
> +					 BPF_F_ADJ_ROOM_ENCAP_L4_UDP)
>  
>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>  enum bpf_adj_room_mode {
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e346e48098000..6007b0b4bc0d7 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2966,6 +2966,9 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
>  static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>  			    u64 flags)
>  {
> +	bool encap = flags & BPF_F_ADJ_ROOM_ENCAP_L3_MASK;
> +	unsigned int gso_type = SKB_GSO_DODGY;
> +	u16 mac_len, inner_net, inner_trans;
>  	int ret;
>  
>  	if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
> @@ -2979,10 +2982,60 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>  	if (unlikely(ret < 0))
>  		return ret;
>  
> +	if (encap) {
> +		if (skb->protocol != htons(ETH_P_IP) &&
> +		    skb->protocol != htons(ETH_P_IPV6))
> +			return -ENOTSUPP;
> +
> +		if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 &&
> +		    flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
> +			return -EINVAL;
> +
> +		if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE &&
> +		    flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP)
> +			return -EINVAL;
> +
> +		if (skb->encapsulation)
> +			return -EALREADY;
> +
> +		mac_len = skb->network_header - skb->mac_header;
> +		inner_net = skb->network_header;
> +		inner_trans = skb->transport_header;
> +	}
> +
>  	ret = bpf_skb_net_hdr_push(skb, off, len_diff);
>  	if (unlikely(ret < 0))
>  		return ret;
>  
> +	if (encap) {
> +		/* inner mac == inner_net on l3 encap */
> +		skb->inner_mac_header = inner_net;
> +		skb->inner_network_header = inner_net;
> +		skb->inner_transport_header = inner_trans;
> +		skb_set_inner_protocol(skb, skb->protocol);
> +
> +		skb->encapsulation = 1;
> +		skb_set_network_header(skb, mac_len);
> +
> +		if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP)
> +			gso_type |= SKB_GSO_UDP_TUNNEL;
> +		else if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE)
> +			gso_type |= SKB_GSO_GRE;
> +		else if (flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
> +			gso_type |= SKB_GSO_IPXIP6;
> +		else
> +			gso_type |= SKB_GSO_IPXIP4;
> +
> +		if (flags & BPF_F_ADJ_ROOM_ENCAP_L4_GRE ||
> +		    flags & BPF_F_ADJ_ROOM_ENCAP_L4_UDP) {
> +			int nh_len = flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 ?
> +					sizeof(struct ipv6hdr) :
> +					sizeof(struct iphdr);
> +
> +			skb_set_transport_header(skb, mac_len + nh_len);
> +		}
> +	}
> +
>  	if (skb_is_gso(skb)) {
>  		struct skb_shared_info *shinfo = skb_shinfo(skb);
>  
> @@ -2991,7 +3044,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>  			skb_decrease_gso_size(shinfo, len_diff);
>  
>  		/* Header must be checked, and gso_segs recomputed. */
> -		shinfo->gso_type |= SKB_GSO_DODGY;
> +		shinfo->gso_type |= gso_type;
>  		shinfo->gso_segs = 0;
>  	}
>  
> @@ -3042,7 +3095,6 @@ static u32 __bpf_skb_max_len(const struct sk_buff *skb)
>  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>  	   u32, mode, u64, flags)
>  {
> -	bool trans_same = skb->transport_header == skb->network_header;
>  	u32 len_cur, len_diff_abs = abs(len_diff);
>  	u32 len_min = bpf_skb_net_base_len(skb);
>  	u32 len_max = __bpf_skb_max_len(skb);
> @@ -3071,8 +3123,6 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>  	}
>  
>  	len_cur = skb->len - skb_network_offset(skb);
> -	if (skb_transport_header_was_set(skb) && !trans_same)
> -		len_cur = skb_network_header_len(skb);
>  	if ((shrink && (len_diff_abs >= len_cur ||
>  			len_cur - len_diff_abs < len_min)) ||
>  	    (!shrink && (skb->len + len_diff_abs > len_max &&
> -- 
> 2.21.0.225.g810b269d1ac-goog
> 
>