netdev - Re: [PATCH bpf-next RFC 2/2] Add MPLS label push/pop functions for EBPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 29 Mar 2018 10:35:22 -0700
From:   Shrijeet mukherjee <shrijeetoss@...il.com>
To:     David Ahern <dsa@...ulusnetworks.com>
Cc:     daniel@...earbox.net, netdev@...r.kernel.org,
        roopa@...ulusnetworks.com, ast@...com
Subject: Re: [PATCH bpf-next RFC 2/2] Add MPLS label push/pop functions for
 EBPF

On 03/28, David Ahern wrote:
> On 3/28/18 10:27 AM, Shrijeet Mukherjee wrote:
> 
> > +static int bpf_skb_mpls_net_grow(struct sk_buff *skb, int len_diff)
> > +{
> > +	u32 off = skb_mac_header_len(skb); /*LL_RESERVED_SPACE ?? */
> > +	int ret;
> > +
> > +	ret = skb_cow(skb, len_diff);
> > +	if (unlikely(ret < 0))
> > +		return ret;
> > +
> > +	skb_set_inner_protocol(skb, skb->protocol);
> > +	skb_reset_inner_network_header(skb);
> > +
> > +	ret = bpf_skb_generic_push(skb, off, len_diff);
> > +	if (unlikely(ret < 0))
> > +		return ret;
> > +
> > +	skb_reset_mac_header(skb);
> > +	skb_set_network_header(skb, ETH_HLEN);
> > +	skb->protocol = eth_hdr(skb)->h_proto = htons(ETH_P_MPLS_UC);
> > +
> > +	if (skb_is_gso(skb)) {
> > +/* Due to header grow, MSS needs to be downgraded. */
> > +		skb_shinfo(skb)->gso_size -= len_diff;
> > +/* Header must be checked, and gso_segs recomputed. */
> > +		skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> > +		skb_shinfo(skb)->gso_segs = 0;
> > +	}
> > +
> > +	bpf_compute_data_pointers(skb);
> > +	return 0;
> > +}
> 
> The only thing MPLS specific is the protocol setting. Seems to me you
> could leverage the existing bpf_skb_net_grow or even bpf_skb_adjust_net
> and then set the protocols and headers in the caller of this function.
> You already do something similar in bpf_skb_mpls_pop.
> 

At a high level, this is the discussion I want to have. There is a loss of efficiency if we try to make generic functions and there are differences which are subtle as MPLS does not have an encompassing header. So, do we want the filter functions to become more common (and I can generate a patch to that point) or do we want them to be hyper-efficient is the question I am seeking an answer to.

Will address all the other comments in v2
> 
> > +
> > +BPF_CALL_2(bpf_skb_mpls_pop, struct sk_buff*, skb, u32, num_lbls)
> > +{
> > +	u32 i = 0;
> > +	struct mpls_shim_hdr *hdr;
> > +	unsigned char *cursor;
> > +	struct mpls_entry_decoded dec;
> > +
> > +	if (num_lbls == 0)
> > +		return 0;
> > +
> > +	cursor = skb_network_header(skb);
> > +	do {
> > +		hdr = (struct mpls_shim_hdr *)cursor;
> > +		dec = mpls_entry_decode(hdr);
> > +		i++; cursor = cursor + sizeof(struct mpls_shim_hdr);
> > +	} while (dec.bos != 1 && i < num_lbls);
> > +
> > +	bpf_push_mac_rcsum(skb);
> > +	skb_pull(skb, i * sizeof(struct mpls_shim_hdr));
> > +	skb_reset_network_header(skb);
> > +
> > +	skb->protocol = eth_hdr(skb)->h_proto = htons(ETH_P_MPLS_UC);
> 
> If I pop 1 label and there are more of them, then the protocol does not
> need to be adjusted -- it was already set to MPLS.
> 
> If I pop all of the labels, then the protocol is not MPLS, but whatever
> the inner packet is -- IPv4 or IPv6 or other.

This is the other real question. The filter functions do not allow a generic array to be returned, so the current expectation is that the callee has parsed the full MPLS label stack and provides the correct number. This could mean that MPLS pop is just a generic reduce_hdr_by function and the protocol etc is handled by the bpf program. I don't like that as it will mean incoherent protocol handling in bpf programs. But wanted to hear opinions.

> 
> 
> > +	bpf_pull_mac_rcsum(skb);
> > +	bpf_compute_data_pointers(skb);
> > +
> > +	return i;
> > +}
> > +
> > +const struct bpf_func_proto bpf_skb_mpls_pop_proto = {
> > +				.func   = bpf_skb_mpls_pop,
> > +				.gpl_only = false,
> > +				.ret_type = RET_INTEGER,
> > +				.arg1_type  = ARG_PTR_TO_CTX,
> > +				.arg2_type  = ARG_ANYTHING,
> > +};
> > +EXPORT_SYMBOL_GPL(bpf_skb_mpls_pop_proto);
> > +
> > +BPF_CALL_3(bpf_skb_mpls_push, struct sk_buff*, skb,
> > +	   __be32*, lbls, u32, num_lbls)
> > +{
> > +	int ret, i;
> > +	unsigned int new_header_size = num_lbls * sizeof(__be32);
> > +	unsigned int ttl = 255;
> > +	struct dst_entry *dst = skb_dst(skb);
> 
> dst is not always set. e.g., any program on ingress prior to the layer 3
> will not have it.
> 
> 
> > +	struct net_device *out_dev = dst->dev;
> > +	struct mpls_shim_hdr *hdr;
> > +	bool bos;
> > +
> > +	/* Ensure there is enough space for the headers in the skb */
> > +	ret = bpf_skb_mpls_net_grow(skb, new_header_size);
> > +	if (ret < 0) {
> > +		trace_printk("COW was killed\n");
> > +		bpf_compute_data_pointers(skb);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	skb->dev = out_dev;
> > +/* XXX this may need finesse to integrate with
> > + * global TTL values for MPLS
> > + */
> > +	if (dst->ops->family == AF_INET)
> > +		ttl = ip_hdr(skb)->ttl;
> > +	else if (dst->ops->family == AF_INET6)
> > +		ttl = ipv6_hdr(skb)->hop_limit;
> > +
> > +	/* Push the new labels */
> > +	hdr = skb_mpls_hdr(skb);
> > +	bos = true;
> > +	for (i = num_lbls - 1; i >= 0; i--) {
> > +		hdr[i] = mpls_entry_encode(lbls[i], ttl, 0, bos);
> > +		bos = false;
> > +	}
> > +
> > +	bpf_compute_data_pointers(skb);
> > +	return 0;
> > +}
> > +
> > +const struct bpf_func_proto bpf_skb_mpls_push_proto = {
> > +	.func		= bpf_skb_mpls_push,
> > +	.gpl_only	= false,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type	= ARG_PTR_TO_CTX,
> > +	.arg2_type	= ARG_PTR_TO_MEM,
> > +	.arg3_type	= ARG_CONST_SIZE,
> > +};
> > +EXPORT_SYMBOL_GPL(bpf_skb_mpls_push_proto);
> > +
> >  static u32 __bpf_skb_min_len(const struct sk_buff *skb)
> >  {
> >  	u32 min_len = skb_network_offset(skb);
> > @@ -3019,6 +3138,8 @@ bool bpf_helper_changes_pkt_data(void *func)
> >  {
> >  	if (func == bpf_skb_vlan_push ||
> >  	    func == bpf_skb_vlan_pop ||
> > +	    func == bpf_skb_mpls_push ||
> > +	    func == bpf_skb_mpls_pop ||
> >  	    func == bpf_skb_store_bytes ||
> >  	    func == bpf_skb_change_proto ||
> >  	    func == bpf_skb_change_head ||
> > @@ -3682,6 +3803,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
> >  		return &bpf_skb_vlan_push_proto;
> >  	case BPF_FUNC_skb_vlan_pop:
> >  		return &bpf_skb_vlan_pop_proto;
> > +	case BPF_FUNC_skb_mpls_push:
> > +		return &bpf_skb_mpls_push_proto;
> > +	case BPF_FUNC_skb_mpls_pop:
> > +		return &bpf_skb_mpls_pop_proto;
> >  	case BPF_FUNC_skb_change_proto:
> >  		return &bpf_skb_change_proto_proto;
> >  	case BPF_FUNC_skb_change_type:
> > 
>