[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP3RGdq-irQ7w8=1xWNPh0Fn+72d9wrKR24vQJTFMa8w4+b6w@mail.gmail.com>
Date: Fri, 9 Oct 2020 16:29:46 -0700
From: Maciej Żenczykowski <maze@...gle.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: bpf <bpf@...r.kernel.org>, Linux NetDev <netdev@...r.kernel.org>,
Daniel Borkmann <borkmann@...earbox.net>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Lorenz Bauer <lmb@...udflare.com>,
Shaun Crampton <shaun@...era.io>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Marek Majkowski <marek@...udflare.com>,
John Fastabend <john.fastabend@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Eyal Birger <eyal.birger@...il.com>
Subject: Re: [PATCH bpf-next V3 3/6] bpf: add BPF-helper for MTU checking
On Thu, Oct 8, 2020 at 7:09 AM Jesper Dangaard Brouer <brouer@...hat.com> wrote:
>
> This BPF-helper bpf_mtu_check() works for both XDP and TC-BPF programs.
bpf_check_mtu() seems a better name.
>
> The API is designed to help the BPF-programmer, that want to do packet
> context size changes, which involves other helpers. These other helpers
> usually does a delta size adjustment. This helper also support a delta
> size (len_diff), which allow BPF-programmer to reuse arguments needed by
> these other helpers, and perform the MTU check prior to doing any actual
> size adjustment of the packet context.
>
> V3: Take L2/ETH_HLEN header size into account and document it.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> ---
> include/uapi/linux/bpf.h | 63 +++++++++++++++++++++
> net/core/filter.c | 119 ++++++++++++++++++++++++++++++++++++++++
> tools/include/uapi/linux/bpf.h | 63 +++++++++++++++++++++
> 3 files changed, 245 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4a46a1de6d16..1dcf5d8195f4 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3718,6 +3718,56 @@ union bpf_attr {
> * never return NULL.
> * Return
> * A pointer pointing to the kernel percpu variable on this cpu.
> + *
> + * int bpf_mtu_check(void *ctx, u32 ifindex, u32 *mtu_result, s32 len_diff, u64 flags)
> + * Description
> + * Check ctx packet size against MTU of net device (based on
> + * *ifindex*). This helper will likely be used in combination with
> + * helpers that adjust/change the packet size. The argument
> + * *len_diff* can be used for querying with a planned size
> + * change. This allows to check MTU prior to changing packet ctx.
> + *
> + * The Linux kernel route table can configure MTUs on a more
> + * specific per route level, which is not provided by this helper.
> + * For route level MTU checks use the **bpf_fib_lookup**\ ()
> + * helper.
> + *
> + * *ctx* is either **struct xdp_md** for XDP programs or
> + * **struct sk_buff** for tc cls_act programs.
> + *
> + * The *flags* argument can be a combination of one or more of the
> + * following values:
> + *
> + * **BPF_MTU_CHK_RELAX**
> + * This flag relax or increase the MTU with room for one
> + * VLAN header (4 bytes) and take into account net device
> + * hard_header_len. This relaxation is also used by the
> + * kernels own forwarding MTU checks.
> + *
> + * **BPF_MTU_CHK_GSO**
> + * This flag will only works for *ctx* **struct sk_buff**.
> + * If packet context contains extra packet segment buffers
> + * (often knows as frags), then those are also checked
> + * against the MTU size.
naming is weird... what does GSO have to do with frags?
Aren't these orthogonal things?
> + *
> + * The *mtu_result* pointer contains the MTU value of the net
> + * device including the L2 header size (usually 14 bytes Ethernet
> + * header). The net device configured MTU is the L3 size, but as
> + * XDP and TX length operate at L2 this helper include L2 header
> + * size in reported MTU.
> + *
> + * Return
> + * * 0 on success, and populate MTU value in *mtu_result* pointer.
> + *
> + * * < 0 if any input argument is invalid (*mtu_result* not updated)
not -EINVAL?
> + *
> + * MTU violations return positive values, but also populate MTU
> + * value in *mtu_result* pointer, as this can be needed for
> + * implemeting PMTU handing:
implementing
> + *
> + * * **BPF_MTU_CHK_RET_FRAG_NEEDED**
> + * * **BPF_MTU_CHK_RET_GSO_TOOBIG**
> + *
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
> @@ -3875,6 +3925,7 @@ union bpf_attr {
> FN(redirect_neigh), \
> FN(bpf_per_cpu_ptr), \
> FN(bpf_this_cpu_ptr), \
> + FN(mtu_check), \
> /* */
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -4889,6 +4940,18 @@ struct bpf_fib_lookup {
> __u8 dmac[6]; /* ETH_ALEN */
> };
>
> +/* bpf_mtu_check flags*/
> +enum bpf_mtu_check_flags {
> + BPF_MTU_CHK_RELAX = (1U << 0),
> + BPF_MTU_CHK_GSO = (1U << 1),
> +};
> +
> +enum bpf_mtu_check_ret {
> + BPF_MTU_CHK_RET_SUCCESS, /* check and lookup successful */
> + BPF_MTU_CHK_RET_FRAG_NEEDED, /* fragmentation required to fwd */
> + BPF_MTU_CHK_RET_GSO_TOOBIG, /* GSO re-segmentation needed to fwd */
> +};
> +
> enum bpf_task_fd_type {
> BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
> BPF_FD_TYPE_TRACEPOINT, /* tp name */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index da74d6ddc4d7..5986156e700e 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5513,6 +5513,121 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
> .arg4_type = ARG_ANYTHING,
> };
>
> +static int bpf_mtu_lookup(struct net *netns, u32 ifindex, u64 flags)
bpf_lookup_mtu() ???
> +{
> + struct net_device *dev;
> + int mtu;
> +
> + dev = dev_get_by_index_rcu(netns, ifindex);
my understanding is this is a bit of a perf hit, maybe ifindex 0 means
use skb->dev ???
or have bpf_lookup_mtu(skb) function as well?
> + if (!dev)
> + return -ENODEV;
> +
> + /* XDP+TC len is L2: Add L2-header as dev MTU is L3 size */
> + mtu = dev->mtu + dev->hard_header_len;
> +
> + /* Same relax as xdp_ok_fwd_dev() and is_skb_forwardable() */
> + if (flags & BPF_MTU_CHK_RELAX)
could this check device vlan tx offload state instead?
> + mtu += VLAN_HLEN;
> +
> + return mtu;
> +}
> +
> +static unsigned int __bpf_len_adjust_positive(unsigned int len, int len_diff)
> +{
> + int len_new = len + len_diff; /* notice len_diff can be negative */
> +
> + if (len_new > 0)
> + return len_new;
> +
> + return 0;
not return len ?
oh I see the function doesn't do what the name implies...
nor sure this func is helpful... why not simply int len_new = (int)len
+ (int)len_diff; directly down below and check < 0 there?
>2GB skb->len is meaningless anyway
> +}
> +
> +BPF_CALL_5(bpf_skb_mtu_check, struct sk_buff *, skb,
> + u32, ifindex, u32 *, mtu_result, s32, len_diff, u64, flags)
> +{
> + struct net *netns = dev_net(skb->dev);
> + int ret = BPF_MTU_CHK_RET_SUCCESS;
> + unsigned int len = skb->len;
> + int mtu;
> +
> + if (flags & ~(BPF_MTU_CHK_RELAX | BPF_MTU_CHK_GSO))
> + return -EINVAL;
> +
> + mtu = bpf_mtu_lookup(netns, ifindex, flags);
> + if (unlikely(mtu < 0))
> + return mtu; /* errno */
> +
> + len = __bpf_len_adjust_positive(len, len_diff);
> + if (len > mtu) {
> + ret = BPF_MTU_CHK_RET_FRAG_NEEDED;
Can't this fail if skb->len includes the entire packet, and yet gso is
on, and packet is greater then mtu, yet gso size is smaller?
Think 200 byte gso packet with 2 100 byte segs, and a 150 byte mtu.
Does gso actually require frags? [As you can tell I don't have a good
handle on gso vs frags vs skb->len, maybe what I"m asking is bogus]
> + goto out;
> + }
> +
> + if (flags & BPF_MTU_CHK_GSO &&
> + skb_is_gso(skb) &&
> + skb_gso_validate_network_len(skb, mtu)) {
> + ret = BPF_MTU_CHK_RET_GSO_TOOBIG;
> + goto out;
> + }
> +
> +out:
> + if (mtu_result)
> + *mtu_result = mtu;
> +
> + return ret;
> +}
> +
> +BPF_CALL_5(bpf_xdp_mtu_check, struct xdp_buff *, xdp,
> + u32, ifindex, u32 *, mtu_result, s32, len_diff, u64, flags)
> +{
> + unsigned int len = xdp->data_end - xdp->data;
> + struct net_device *dev = xdp->rxq->dev;
> + struct net *netns = dev_net(dev);
> + int ret = BPF_MTU_CHK_RET_SUCCESS;
> + int mtu;
> +
> + /* XDP variant doesn't support multi-buffer segment check (yet) */
> + if (flags & ~BPF_MTU_CHK_RELAX)
> + return -EINVAL;
> +
> + mtu = bpf_mtu_lookup(netns, ifindex, flags);
> + if (unlikely(mtu < 0))
> + return mtu; /* errno */
> +
> + len = __bpf_len_adjust_positive(len, len_diff);
> + if (len > mtu) {
> + ret = BPF_MTU_CHK_RET_FRAG_NEEDED;
> + goto out;
> + }
> +out:
> + if (mtu_result)
> + *mtu_result = mtu;
> +
> + return ret;
> +}
> +
> +static const struct bpf_func_proto bpf_skb_mtu_check_proto = {
> + .func = bpf_skb_mtu_check,
> + .gpl_only = true,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_CTX,
> + .arg2_type = ARG_ANYTHING,
> + .arg3_type = ARG_PTR_TO_MEM,
> + .arg4_type = ARG_ANYTHING,
> + .arg5_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto bpf_xdp_mtu_check_proto = {
> + .func = bpf_xdp_mtu_check,
> + .gpl_only = true,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_CTX,
> + .arg2_type = ARG_ANYTHING,
> + .arg3_type = ARG_PTR_TO_MEM,
> + .arg4_type = ARG_ANYTHING,
> + .arg5_type = ARG_ANYTHING,
> +};
> +
> #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
> static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
> {
> @@ -7076,6 +7191,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> return &bpf_get_socket_uid_proto;
> case BPF_FUNC_fib_lookup:
> return &bpf_skb_fib_lookup_proto;
> + case BPF_FUNC_mtu_check:
> + return &bpf_skb_mtu_check_proto;
> case BPF_FUNC_sk_fullsock:
> return &bpf_sk_fullsock_proto;
> case BPF_FUNC_sk_storage_get:
> @@ -7145,6 +7262,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> return &bpf_xdp_adjust_tail_proto;
> case BPF_FUNC_fib_lookup:
> return &bpf_xdp_fib_lookup_proto;
> + case BPF_FUNC_mtu_check:
> + return &bpf_xdp_mtu_check_proto;
> #ifdef CONFIG_INET
> case BPF_FUNC_sk_lookup_udp:
> return &bpf_xdp_sk_lookup_udp_proto;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4a46a1de6d16..1dcf5d8195f4 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3718,6 +3718,56 @@ union bpf_attr {
> * never return NULL.
> * Return
> * A pointer pointing to the kernel percpu variable on this cpu.
> + *
> + * int bpf_mtu_check(void *ctx, u32 ifindex, u32 *mtu_result, s32 len_diff, u64 flags)
> + * Description
> + * Check ctx packet size against MTU of net device (based on
> + * *ifindex*). This helper will likely be used in combination with
> + * helpers that adjust/change the packet size. The argument
> + * *len_diff* can be used for querying with a planned size
> + * change. This allows to check MTU prior to changing packet ctx.
> + *
> + * The Linux kernel route table can configure MTUs on a more
> + * specific per route level, which is not provided by this helper.
> + * For route level MTU checks use the **bpf_fib_lookup**\ ()
> + * helper.
> + *
> + * *ctx* is either **struct xdp_md** for XDP programs or
> + * **struct sk_buff** for tc cls_act programs.
> + *
> + * The *flags* argument can be a combination of one or more of the
> + * following values:
> + *
> + * **BPF_MTU_CHK_RELAX**
> + * This flag relax or increase the MTU with room for one
> + * VLAN header (4 bytes) and take into account net device
> + * hard_header_len. This relaxation is also used by the
> + * kernels own forwarding MTU checks.
> + *
> + * **BPF_MTU_CHK_GSO**
> + * This flag will only works for *ctx* **struct sk_buff**.
> + * If packet context contains extra packet segment buffers
> + * (often knows as frags), then those are also checked
> + * against the MTU size.
> + *
> + * The *mtu_result* pointer contains the MTU value of the net
> + * device including the L2 header size (usually 14 bytes Ethernet
> + * header). The net device configured MTU is the L3 size, but as
> + * XDP and TX length operate at L2 this helper include L2 header
> + * size in reported MTU.
> + *
> + * Return
> + * * 0 on success, and populate MTU value in *mtu_result* pointer.
> + *
> + * * < 0 if any input argument is invalid (*mtu_result* not updated)
> + *
> + * MTU violations return positive values, but also populate MTU
> + * value in *mtu_result* pointer, as this can be needed for
> + * implemeting PMTU handing:
> + *
> + * * **BPF_MTU_CHK_RET_FRAG_NEEDED**
> + * * **BPF_MTU_CHK_RET_GSO_TOOBIG**
> + *
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
> @@ -3875,6 +3925,7 @@ union bpf_attr {
> FN(redirect_neigh), \
> FN(bpf_per_cpu_ptr), \
> FN(bpf_this_cpu_ptr), \
> + FN(mtu_check), \
> /* */
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -4889,6 +4940,18 @@ struct bpf_fib_lookup {
> __u8 dmac[6]; /* ETH_ALEN */
> };
>
> +/* bpf_mtu_check flags*/
> +enum bpf_mtu_check_flags {
> + BPF_MTU_CHK_RELAX = (1U << 0),
> + BPF_MTU_CHK_GSO = (1U << 1),
> +};
> +
> +enum bpf_mtu_check_ret {
> + BPF_MTU_CHK_RET_SUCCESS, /* check and lookup successful */
> + BPF_MTU_CHK_RET_FRAG_NEEDED, /* fragmentation required to fwd */
> + BPF_MTU_CHK_RET_GSO_TOOBIG, /* GSO re-segmentation needed to fwd */
> +};
> +
> enum bpf_task_fd_type {
> BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
> BPF_FD_TYPE_TRACEPOINT, /* tp name */
Powered by blists - more mailing lists