netdev - Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2b40603-6b42-7c87-49cf-f139e80dfd8a@iogearbox.net>
Date:   Fri, 29 Jun 2018 09:04:15 +0200
From:   Daniel Borkmann <borkmann@...earbox.net>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jiri Benc <jbenc@...hat.com>
Cc:     davem@...emloft.net, Roopa Prabhu <roopa@...ulusnetworks.com>,
        jiri@...nulli.us, jhs@...atatu.com, xiyou.wangcong@...il.com,
        oss-drivers@...ronome.com, netdev@...r.kernel.org,
        Pieter Jansen van Vuuren 
        <pieter.jansenvanvuuren@...ronome.com>
Subject: Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel
 flags

On 06/28/2018 06:54 PM, Jakub Kicinski wrote:
> On Thu, 28 Jun 2018 09:42:06 +0200, Jiri Benc wrote:
>> On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:
>>> Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
>>> right approach since this is opaque info and solely defined by the BPF prog
>>> that is using the generic helper.  
>>
>> Wouldn't it make sense to introduce some safeguards here (in a backward
>> compatible way, of course)? It's easy to mistakenly set data for a
>> different tunnel type in a BPF program and then be surprised by the
>> result. It might help users if such usage was detected by the kernel,
>> one way or another.
> 
> Well, that's how it works today ;)

Well, it was designed like that on purpose, to be i) agnostic of the underlying
device, ii) to not clutter BPF API with tens of different APIs effectively doing
the same thing, and at the same time to avoid adding protocol specifics. E.g. at
least core bits of bpf_skb_{set,get}_tunnel_key() will work whether I use vxlan
or geneve underneath (we are actually using it this way) and I could use things
like tun_id to encode custom meta data from BPF for either of them depending on flavor
picked by orchestration system. For the tunnel options in bpf_skb_{set,get}_tunnel_opt()
it's similar although here there needs to be awareness of the underlying dev depending
on whether you encode data into e.g. gbp or tlvs, etc. However, downside right now I
can see with a patch like below is that:

i) People might still just keep using 'TUNNEL_OPTIONS_PRESENT path' since available
and backwards compatible with current/older kernels, ii) we cut bits away from
size over time for each new tunnel proto added in future that would support tunnel
options, iii) that extension is one-sided (at least below) and same would be needed
in getter part, and iv) there needs to be a way for the case when folks add new
tunnel options where we don't need to worry that we forget updating BPF_F_TUN_*
each time otherwise this will easily slip through and again people will just rely
on using TUNNEL_OPTIONS_PRESENT catchall. Given latter and in particular point i)
I wouldn't think it's worth the pain, the APIs were added to BPF in v4.6 so this
would buy them 2 more years wrt kernel compatibility with same functionality level.
And point v), I just noticed the patch is actually buggy: size is ARG_CONST_SIZE and
verifier will attempt to check the value whether the buffer passed in argument 2 is
valid or not, so using flags here in upper bits would let verification fail, you'd
really have to make a new helper just for this.

Best,
Daniel

>> I'm thinking about something like the BPF program voluntarily
>> specifying the type of the data; if not specified, the wildcard would be
>> used as it is now.
> 
> Hmm... in practice we could steal top bits of the size parameter for
> some flags, since it seems to be limited to values < 256 today?  Is it
> worth it?
> 
> It would look something along the lines of:
> 
> ---
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 59b19b6a40d7..194b40efa8e8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2213,6 +2213,13 @@ enum bpf_func_id {
>  /* BPF_FUNC_perf_event_output for sk_buff input context. */
>  #define BPF_F_CTXLEN_MASK              (0xfffffULL << 32)
>  
> +#define BPF_F_TUN_VXLAN                        (1U << 31)
> +#define BPF_F_TUN_GENEVE               (1U << 30)
> +#define BPF_F_TUN_ERSPAN               (1U << 29)
> +#define BPF_F_TUN_FLAGS_ALL            (BPF_F_TUN_VXLAN | \
> +                                        BPF_F_TUN_GENEVE | \
> +                                        BPF_F_TUN_ERSPAN)
> +
>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>  enum bpf_adj_room_mode {
>         BPF_ADJ_ROOM_NET,
> diff --git a/net/core/filter.c b/net/core/filter.c
> index dade922678f6..cc592a1e8945 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3576,6 +3576,22 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
>  {
>         struct ip_tunnel_info *info = skb_tunnel_info(skb);
>         const struct metadata_dst *md = this_cpu_ptr(md_dst);
> +       __be16 tun_flags;
> +       u32 flags;
> +
> +       BUILD_BUG_ON(BPF_F_TUN_FLAGS_ALL & IP_TUNNEL_OPTS_MAX);
> +
> +       flags = size & BPF_F_TUN_FLAGS_ALL;
> +       size &= ~flags;
> +       if (flags & BPF_F_TUN_VXLAN)
> +               tun_flags |= TUNNEL_VXLAN_OPT;
> +       if (flags & BPF_F_TUN_GENEVE)
> +               tun_flags |= TUNNEL_GENEVE_OPT;
> +       if (flags & BPF_F_TUN_ERSPAN)
> +               tun_flags |= TUNNEL_ERSPAN_OPT;
> +       /* User didn't specify the tunnel type, for backward compat set all */
> +       if (!(tun_flags & TUNNEL_OPTIONS_PRESENT))
> +               tun_flags |= TUNNEL_OPTIONS_PRESENT;
>  
>         if (unlikely(info != &md->u.tun_info || (size & (sizeof(u32) - 1))))
>                 return -EINVAL;
>