netdev - Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7aeb6082-48a3-9b71-2e2c-10adeb5ee79a@iogearbox.net>
Date:   Wed, 7 Oct 2020 23:37:00 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Jesper Dangaard Brouer <brouer@...hat.com>, bpf@...r.kernel.org
Cc:     netdev@...r.kernel.org, Daniel Borkmann <borkmann@...earbox.net>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        maze@...gle.com, lmb@...udflare.com, shaun@...era.io,
        Lorenzo Bianconi <lorenzo@...nel.org>, marek@...udflare.com,
        John Fastabend <john.fastabend@...il.com>,
        Jakub Kicinski <kuba@...nel.org>, eyal.birger@...il.com
Subject: Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets
 after egress hook

On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote:
[...]
>   net/core/dev.c |   24 ++++++++++++++++++++++--
>   1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b433098896b2..19406013f93e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>   	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
>   	case TC_ACT_OK:
>   	case TC_ACT_RECLASSIFY:
> +		*ret = NET_XMIT_SUCCESS;
>   		skb->tc_index = TC_H_MIN(cl_res.classid);
>   		break;
>   	case TC_ACT_SHOT:
> @@ -4064,9 +4065,12 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>   {
>   	struct net_device *dev = skb->dev;
>   	struct netdev_queue *txq;
> +#ifdef CONFIG_NET_CLS_ACT
> +	bool mtu_check = false;
> +#endif
> +	bool again = false;
>   	struct Qdisc *q;
>   	int rc = -ENOMEM;
> -	bool again = false;
>   
>   	skb_reset_mac_header(skb);
>   
> @@ -4082,14 +4086,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>   
>   	qdisc_pkt_len_init(skb);
>   #ifdef CONFIG_NET_CLS_ACT
> +	mtu_check = skb_is_redirected(skb);
>   	skb->tc_at_ingress = 0;
>   # ifdef CONFIG_NET_EGRESS
>   	if (static_branch_unlikely(&egress_needed_key)) {
> +		unsigned int len_orig = skb->len;
> +
>   		skb = sch_handle_egress(skb, &rc, dev);
>   		if (!skb)
>   			goto out;
> +		/* BPF-prog ran and could have changed packet size beyond MTU */
> +		if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
> +			mtu_check = true;
>   	}
>   # endif
> +	/* MTU-check only happens on "last" net_device in a redirect sequence
> +	 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
> +	 * either ingress or egress to another device).
> +	 */

Hmm, quite some overhead in fast path. Also, won't this be checked multiple times
on stacked devices? :( Moreover, this missed the fact that 'real' qdiscs can have
filters attached too, this would come after this check. Can't this instead be in
driver layer for those that really need it? I would probably only drop the check
as done in 1/6 and allow the BPF prog to do the validation if needed.

> +	if (mtu_check && !is_skb_forwardable(dev, skb)) {
> +		rc = -EMSGSIZE;
> +		goto drop;
> +	}
>   #endif
>   	/* If device/qdisc don't need skb->dst, release it right now while
>   	 * its hot in this cpu cache.
> @@ -4157,7 +4175,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>   
>   	rc = -ENETDOWN;
>   	rcu_read_unlock_bh();
> -
> +#ifdef CONFIG_NET_CLS_ACT
> +drop:
> +#endif
>   	atomic_long_inc(&dev->tx_dropped);
>   	kfree_skb_list(skb);
>   	return rc;
>