linux-kernel - Re: [PATCH net-next 2/2] tun: allow to attach ebpf socket filter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b505ed70-8359-2d7b-7062-8a351687187f@redhat.com>
Date:   Tue, 2 Jan 2018 11:28:30 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next 2/2] tun: allow to attach ebpf socket filter



On 2017年12月31日 18:14, Willem de Bruijn wrote:
> On Fri, Dec 29, 2017 at 3:44 AM, Jason Wang <jasowang@...hat.com> wrote:
>> This patch allows userspace to attach eBPF filter to tun. This will
>> allow to implement VM dataplane filtering in a more efficient way
>> compared to cBPF filter.
> Is the idea to allow the trusted hypervisor to install these programs,
> or the untrusted guests?

Not from guest (but I'm really considering bpf/XDP offload in the 
future, as you suggested in the pass, we probably need a new type other 
than socket filter), the main motivation is to implement vhost-net 
filtering. The idea is let qemu to attach eBPF socket filter to tun.


>
> eBPF privilege escalations like those recently described in
> https://lwn.net/Articles/742170/ would give me pause to expose
> this to guests.

Right, if underprivileged bpf is disabled by the distribution, it would 
be hard (but still possible):

- Qemu will notify libvirt about rx filter changes of virtio-net
- Libvirt will reprogram the ebpf filter

>
>> Signed-off-by: Jason Wang <jasowang@...hat.com>
>> ---
>>   drivers/net/tun.c           | 26 ++++++++++++++++++++++++++
>>   include/uapi/linux/if_tun.h |  1 +
>>   2 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 0853829..6e9452b 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -238,6 +238,7 @@ struct tun_struct {
>>          struct tun_pcpu_stats __percpu *pcpu_stats;
>>          struct bpf_prog __rcu *xdp_prog;
>>          struct tun_prog __rcu *steering_prog;
>> +       struct tun_prog __rcu *filter_prog;
>>   };
>>
>>   static int tun_napi_receive(struct napi_struct *napi, int budget)
>> @@ -984,12 +985,25 @@ static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
>>   #endif
>>   }
>>
>> +static unsigned int run_ebpf_filter(struct tun_struct *tun,
>> +                                   struct sk_buff *skb,
>> +                                   int len)
>> +{
>> +       struct tun_prog *prog = rcu_dereference(tun->filter_prog);
>> +
>> +       if (prog)
>> +               len = bpf_prog_run_clear_cb(prog->prog, skb);
>> +
>> +       return len;
>> +}
>> +
>>   /* Net device start xmit */
>>   static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>          struct tun_struct *tun = netdev_priv(dev);
>>          int txq = skb->queue_mapping;
>>          struct tun_file *tfile;
>> +       int len = skb->len;
>>
>>          rcu_read_lock();
>>          tfile = rcu_dereference(tun->tfiles[txq]);
>> @@ -1015,9 +1029,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>              sk_filter(tfile->socket.sk, skb))
>>                  goto drop;
>>
>> +       len = run_ebpf_filter(tun, skb, len);
>> +       if (!len)
>> +               goto drop;
>> +
> This adds a second filter step independent of the sk_filter call above.
> Perhaps the two filter interfaces can map onto to the same instance.
> I imagine that qemu never programs SO_ATTACH_FILTER.

I think you mean TUNATTACHFILTER here (and we could not assume the tun 
is only used by qemu). A quick glance does not give any idea on how to 
reuse it for eBPF or differ eBPF from cBPF.

Btw, there're other differences. TUNATTACHBPF attach the prog to tun 
which means it simplifies lots of things e.g persist devices or queue 
enabling/disabling.  But TUNATTACHFILTER attach the prog to socket.

>
> More importantly, should this program just return a boolean pass or
> drop. Taking a length and trimming may introduce bugs later on if the
> stack parses the packet unconditionally, expecting a minimum size
> to be present.
>
> This was the reason for introducing sk_filter_trim_cap and using that
> in other sk_filter sites.
>
> A quick scan shows that tun_put_user expects a full vlan tag to exist
> if skb_vlan_tag_present(skb), for instance. If trimmed to below this
> length the final call to skb_copy_datagram_iter may have negative
> length.
>
> This is an issue with the existing sk_filter call as much as with the
> new run_ebpf_filter call.

Good point, so consider it was used by sk_filter too, we need to fix it 
anyway. Actually, I've considered the boolean return value but finally I 
decide to obey the style of sk filter. Maybe the trimming has real user. 
e.g high speed header recoding/analysis? Consider it's not hard to fix, 
how about just keep that?

Thanks

>
>>          if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
>>                  goto drop;
>>
>> +       if (pskb_trim(skb, len))
>> +               goto drop;
>> +
>>          skb_tx_timestamp(skb);