netdev - Re: [PATCH net-next 2/2] tun: allow to attach ebpf socket filter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-+vW9TBoJ6RBYYuyWHfNcTHPL6CvBYwfKS59ww3yZsZ8A@mail.gmail.com>
Date:   Sun, 31 Dec 2017 11:14:19 +0100
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next 2/2] tun: allow to attach ebpf socket filter

On Fri, Dec 29, 2017 at 3:44 AM, Jason Wang <jasowang@...hat.com> wrote:
> This patch allows userspace to attach eBPF filter to tun. This will
> allow to implement VM dataplane filtering in a more efficient way
> compared to cBPF filter.

Is the idea to allow the trusted hypervisor to install these programs,
or the untrusted guests?

eBPF privilege escalations like those recently described in
https://lwn.net/Articles/742170/ would give me pause to expose
this to guests.

> Signed-off-by: Jason Wang <jasowang@...hat.com>
> ---
>  drivers/net/tun.c           | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/if_tun.h |  1 +
>  2 files changed, 27 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 0853829..6e9452b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -238,6 +238,7 @@ struct tun_struct {
>         struct tun_pcpu_stats __percpu *pcpu_stats;
>         struct bpf_prog __rcu *xdp_prog;
>         struct tun_prog __rcu *steering_prog;
> +       struct tun_prog __rcu *filter_prog;
>  };
>
>  static int tun_napi_receive(struct napi_struct *napi, int budget)
> @@ -984,12 +985,25 @@ static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
>  #endif
>  }
>
> +static unsigned int run_ebpf_filter(struct tun_struct *tun,
> +                                   struct sk_buff *skb,
> +                                   int len)
> +{
> +       struct tun_prog *prog = rcu_dereference(tun->filter_prog);
> +
> +       if (prog)
> +               len = bpf_prog_run_clear_cb(prog->prog, skb);
> +
> +       return len;
> +}
> +
>  /* Net device start xmit */
>  static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>         struct tun_struct *tun = netdev_priv(dev);
>         int txq = skb->queue_mapping;
>         struct tun_file *tfile;
> +       int len = skb->len;
>
>         rcu_read_lock();
>         tfile = rcu_dereference(tun->tfiles[txq]);
> @@ -1015,9 +1029,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>             sk_filter(tfile->socket.sk, skb))
>                 goto drop;
>
> +       len = run_ebpf_filter(tun, skb, len);
> +       if (!len)
> +               goto drop;
> +

This adds a second filter step independent of the sk_filter call above.
Perhaps the two filter interfaces can map onto to the same instance.
I imagine that qemu never programs SO_ATTACH_FILTER.

More importantly, should this program just return a boolean pass or
drop. Taking a length and trimming may introduce bugs later on if the
stack parses the packet unconditionally, expecting a minimum size
to be present.

This was the reason for introducing sk_filter_trim_cap and using that
in other sk_filter sites.

A quick scan shows that tun_put_user expects a full vlan tag to exist
if skb_vlan_tag_present(skb), for instance. If trimmed to below this
length the final call to skb_copy_datagram_iter may have negative
length.

This is an issue with the existing sk_filter call as much as with the
new run_ebpf_filter call.

>         if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
>                 goto drop;
>
> +       if (pskb_trim(skb, len))
> +               goto drop;
> +
>         skb_tx_timestamp(skb);