lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 9 Oct 2019 18:18:14 +0530
From:   Martin Varghese <martinvarghesenokia@...il.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>, corbet@....net,
        scott.drennan@...ia.com, Jiri Benc <jbenc@...hat.com>,
        martin.varghese@...ia.com
Subject: Re: [PATCH net-next 1/2] UDP tunnel encapsulation module for
 tunnelling different protocols like MPLS,IP,NSH etc.

On Tue, Oct 08, 2019 at 12:28:23PM -0400, Willem de Bruijn wrote:
> On Tue, Oct 8, 2019 at 5:51 AM Martin Varghese
> <martinvarghesenokia@...il.com> wrote:
> >
> > From: Martin <martin.varghese@...ia.com>
> >
> > The Bareudp tunnel module provides a generic L3 encapsulation
> > tunnelling module for tunnelling different protocols like MPLS,
> > IP,NSH etc inside a UDP tunnel.
> >
> > Signed-off-by: Martin Varghese <martinvarghesenokia@...il.com>
> > ---
> >  Documentation/networking/bareudp.txt |  23 +
> >  drivers/net/Kconfig                  |  13 +
> >  drivers/net/Makefile                 |   1 +
> >  drivers/net/bareudp.c                | 930 +++++++++++++++++++++++++++++++++++
> >  include/net/bareudp.h                |  19 +
> >  include/uapi/linux/if_link.h         |  12 +
> >  6 files changed, 998 insertions(+)
> >  create mode 100644 Documentation/networking/bareudp.txt
> >  create mode 100644 drivers/net/bareudp.c
> >  create mode 100644 include/net/bareudp.h
> >
> > diff --git a/Documentation/networking/bareudp.txt b/Documentation/networking/bareudp.txt
> > new file mode 100644
> > index 0000000..d2530e2
> > --- /dev/null
> > +++ b/Documentation/networking/bareudp.txt
> > @@ -0,0 +1,23 @@
> > +Bare UDP Tunnelling Module Documentation
> > +========================================
> > +
> > +There are various L3 encapsulation standards using UDP being discussed to
> > +leverage the UDP based load balancing capability of different networks.
> > +MPLSoUDP (https://tools.ietf.org/html/rfc7510)is one among them.
> > +
> > +The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
> > +support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
> > +a UDP tunnel.
> 
> This patch set introduces a lot of code, much of which duplicates
> existing tunnel devices, whether FOU using ipip tunnels and
> fou_build_header or separate devices like vxlan and geneve. Let's try
> to reuse what's there (and tested).
> 
> How does this differ from foo-over-udp (FOU). In supporting MPLS
> alongside IP? If so, can it reuse the existing code, possibly with
> patches to that existing tunnel logic?
> 
> I happened to have been playing around with MPLS in GRE recently. Have
> not tried the same over UDP tunnels, but most of it should apply?
> 
>   ip tunnel add gre1 mode gre local 192.168.1.1 remote 192.168.1.2 dev eth0
>   ip addr add 192.168.101.1 peer 192.168.101.2 dev gre1
>   ip link set dev gre1 up
> 
>   sysctl net.mpls.conf.gre1.input=1
>   sysctl net.mpls.platform_labels=17
>   ip addr add 192.168.201.1/24 dev gre1
>   ip route add 192.168.202.0/24 encap mpls 17 dev gre1
>   ip -f mpls route add 16 dev lo
> 
> 
unlike FOU which does l4 encapsulation, here we are trying to acheive l3 encapsulation.
For Example,  What is needed for MPLSoUDP is to take packets such as:

eth header | mpls header | payload

and encapsulate them to:

eth header | ip header | udp header | mpls header | payload
<--------- outer ------------------> <---- inner --------->

which is later decapsulated back to:

eth header | mpls header | payload

Note that in the decapsulated case, the ethertype in the eth header is ETH_P_MPLS_UC, while in the encapsulated case, the ethertype in the eth header is ETH_P_IP. When decapsulating, ETH_P_MPLS_UC is
restored based on the configured tunnel parameters.

This cannot be done with FOU. FOU needs an ipproto, not ethertype.

During receive FOU dispatches packet to l4 handlers which makes it impossible to patch it to get it working for l3 protocols like MPLS.

> > +static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> > +{
> > +       struct bareudp_sock *bs;
> > +       struct ethhdr *eh;
> > +       struct bareudp_dev *bareudp;
> > +       struct metadata_dst *tun_dst = NULL;
> > +       struct pcpu_sw_netstats *stats;
> > +       unsigned int len;
> > +       int err = 0;
> > +       void *oiph;
> > +       u16 proto;
> > +
> > +       if (unlikely(!pskb_may_pull(skb, BAREUDP_BASE_HLEN)))
> > +               goto drop;
> > +
> > +       bs = rcu_dereference_sk_user_data(sk);
> > +       if (!bs)
> > +               goto drop;
> > +
> > +       bareudp = bs->bareudp;
> > +       proto = bareudp->ethertype;
> > +
> > +       if (iptunnel_pull_header(skb, BAREUDP_BASE_HLEN,
> > +                                proto,
> > +                                !net_eq(bareudp->net,
> > +                                        dev_net(bareudp->dev)))) {
> > +               bareudp->dev->stats.rx_dropped++;
> > +               goto drop;
> > +       }
> > +       tun_dst = udp_tun_rx_dst(skb, bareudp_get_sk_family(bs), TUNNEL_KEY,
> > +                                0, 0);
> > +       if (!tun_dst) {
> > +               bareudp->dev->stats.rx_dropped++;
> > +               goto drop;
> > +       }
> > +       skb_dst_set(skb, &tun_dst->dst);
> 
> Is this dst metadata a firm requirement? It is optional in vxlan, say.
> If here, too, please split such parts off into separate follow-on
> patches.
> 
>
Yes it is. 
> > +       skb_push(skb, sizeof(struct ethhdr));
> > +       eh = (struct ethhdr *)skb->data;
> > +       eh->h_proto = proto;
> > +
> > +       skb_reset_mac_header(skb);
> > +       skb->protocol = eth_type_trans(skb, bareudp->dev);
> > +       skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> > +       oiph = skb_network_header(skb);
> > +       skb_reset_network_header(skb);
> > +
> > +       if (bareudp_get_sk_family(bs) == AF_INET)
> 
> This should be derived from packet contents, not socket state.
> Although the one implies the other, I imagine.
> 
> > +static struct rtable *bareudp_get_v4_rt(struct sk_buff *skb,
> > +                                       struct net_device *dev,
> > +                                       struct bareudp_sock *bs4,
> > +                                       struct flowi4 *fl4,
> > +                                       const struct ip_tunnel_info *info)
> > +{
> > +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > +       struct bareudp_dev *bareudp = netdev_priv(dev);
> > +       struct dst_cache *dst_cache;
> > +       struct rtable *rt = NULL;
> > +       __u8 tos;
> > +
> > +       if (!bs4)
> > +               return ERR_PTR(-EIO);
> > +
> > +       memset(fl4, 0, sizeof(*fl4));
> > +       fl4->flowi4_mark = skb->mark;
> > +       fl4->flowi4_proto = IPPROTO_UDP;
> > +       fl4->daddr = info->key.u.ipv4.dst;
> > +       fl4->saddr = info->key.u.ipv4.src;
> > +
> > +       tos = info->key.tos;
> > +       fl4->flowi4_tos = RT_TOS(tos);
> > +
> > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > +       if (use_cache) {
> > +               rt = dst_cache_get_ip4(dst_cache, &fl4->saddr);
> > +               if (rt)
> > +                       return rt;
> > +       }
> > +       rt = ip_route_output_key(bareudp->net, fl4);
> > +       if (IS_ERR(rt)) {
> > +               netdev_dbg(dev, "no route to %pI4\n", &fl4->daddr);
> > +               return ERR_PTR(-ENETUNREACH);
> > +       }
> > +       if (rt->dst.dev == dev) { /* is this necessary? */
> > +               netdev_dbg(dev, "circular route to %pI4\n", &fl4->daddr);
> > +               ip_rt_put(rt);
> > +               return ERR_PTR(-ELOOP);
> > +       }
> > +       if (use_cache)
> > +               dst_cache_set_ip4(dst_cache, &rt->dst, fl4->saddr);
> > +       return rt;
> > +}
> > +
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +static struct dst_entry *bareudp_get_v6_dst(struct sk_buff *skb,
> > +                                           struct net_device *dev,
> > +                                           struct bareudp_sock *bs6,
> > +                                           struct flowi6 *fl6,
> > +                                           const struct ip_tunnel_info *info)
> > +{
> > +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > +       struct bareudp_dev *bareudp = netdev_priv(dev);
> > +       struct dst_entry *dst = NULL;
> > +       struct dst_cache *dst_cache;
> > +       __u8 prio;
> > +
> > +       if (!bs6)
> > +               return ERR_PTR(-EIO);
> > +
> > +       memset(fl6, 0, sizeof(*fl6));
> > +       fl6->flowi6_mark = skb->mark;
> > +       fl6->flowi6_proto = IPPROTO_UDP;
> > +       fl6->daddr = info->key.u.ipv6.dst;
> > +       fl6->saddr = info->key.u.ipv6.src;
> > +       prio = info->key.tos;
> > +
> > +       fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
> > +                                          info->key.label);
> > +       dst_cache = (struct dst_cache *)&info->dst_cache;
> > +       if (use_cache) {
> > +               dst = dst_cache_get_ip6(dst_cache, &fl6->saddr);
> > +               if (dst)
> > +                       return dst;
> > +       }
> > +       if (ipv6_stub->ipv6_dst_lookup(bareudp->net, bs6->sock->sk, &dst,
> > +                                      fl6)) {
> > +               netdev_dbg(dev, "no route to %pI6\n", &fl6->daddr);
> > +               return ERR_PTR(-ENETUNREACH);
> > +       }
> > +       if (dst->dev == dev) { /* is this necessary? */
> > +               netdev_dbg(dev, "circular route to %pI6\n", &fl6->daddr);
> > +               dst_release(dst);
> > +               return ERR_PTR(-ELOOP);
> > +       }
> > +
> > +       if (use_cache)
> > +               dst_cache_set_ip6(dst_cache, dst, &fl6->saddr);
> > +       return dst;
> > +}
> > +#endif
> 
> The route lookup logic is very similar to vxlan_get_route and
> vxlan6_get_route. Can be reused?

Yes  we could. We can move these to a common place. include/net/ip_tunnels.h ? 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ