lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 28 Nov 2019 21:54:27 +0530
From:   Martin Varghese <martinvarghesenokia@...il.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>, corbet@....net,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        scott.drennan@...ia.com, Jiri Benc <jbenc@...hat.com>,
        martin.varghese@...ia.com
Subject: Re: [PATCH v3 net-next 1/2] UDP tunnel encapsulation module for
 tunnelling different protocols like MPLS,IP,NSH etc.

On Mon, Nov 18, 2019 at 12:23:09PM -0500, Willem de Bruijn wrote:
> On Sat, Nov 16, 2019 at 12:45 AM Martin Varghese
> <martinvarghesenokia@...il.com> wrote:
> >
> > From: Martin Varghese <martin.varghese@...ia.com>
> >
> > The Bareudp tunnel module provides a generic L3 encapsulation
> > tunnelling module for tunnelling different protocols like MPLS,
> > IP,NSH etc inside a UDP tunnel.
> >
> > Signed-off-by: Martin Varghese <martin.varghese@...ia.com>
> > ---
> > Changes in v2:
> >      - Fixed documentation errors.
> >      - Converted documentation to rst format.
> >      - Moved ip tunnel rt lookup code to a common location.
> 
> This does not actually deduplicate with the existing code in geneve
> (and vxlan, maybe others).
> 
> Mentioned a few more obvious examples inline. I won't keep harping on
> this point, as I have no real stake in this.
> 
> But if you don't deduplicate from the start, I'm skeptical that it
> will happen later. And then fixes to one driver are likely to be
> missed for the other, slowly drifting the code further apart.
> 

Adaptation of geneve module to use the generic API will be in a 
seperate patch series.

> > +1) Device creation & deletion
> > +
> > +    a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847.
> > +
> > +       This creates a bareudp tunnel device which tunnels L3 traffic with ethertype
> > +       0x8847 (MPLS traffic). The destination port of the UDP header will be set to
> > +       6635.The device will listen on UDP port 6635 to receive traffic.
> > +
> 
> Might be useful to also document how to use this on transmission.
> 
> > +struct bareudp_sock {
> > +       struct socket           *sock;
> > +       struct rcu_head         rcu;
> > +       struct bareudp_dev      *bareudp;
> > +};
> 
> Is this intermediate struct needed, or could struct bareudp_dev
> directly hang off of sk_user_data if it gains a struct rcu_head?
> Especially now that there is a 1:1 relationship between socket and
> device.
> 
will check.

> > +       skb_push(skb, sizeof(struct ethhdr));
> > +       skb_reset_mac_header(skb);
> > +
> > +       err = skb_ensure_writable(skb, sizeof(struct ethhdr));
> > +       if (unlikely(err))
> > +               goto drop;
> > +
> > +       eh = (struct ethhdr *)skb->data;
> > +       eh->h_proto = proto;
> > +       skb->protocol = eth_type_trans(skb, bareudp->dev);
> > +       oiph = skb_network_header(skb);
> > +       skb_reset_network_header(skb);
> 
> After decapsulation packets now have a fake ethernet header with
> garbage in the src and dst address. Tcpdump will show that.
> 
> Instead of pseudo Ethernet, should this device be of link layer type
> ARPHRD_NONE. Or alternatively a new type analogous to ARPHRD_TUNNEL
> that includes the udp header?
> 
> The same can probably be said about geneve, so no strong objection to
> what is here. Just a point for thought.
>

I am pondering more on this comment. 
> > +       if (bareudp_get_sk_family(bs) == AF_INET)
> > +               err = IP_ECN_decapsulate(oiph, skb);
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +       else
> > +               err = IP6_ECN_decapsulate(oiph, skb);
> > +#endif
> 
> with dual stack sockets, should this check now be based on packet
> header instead of bareudp_get_sk_family?
> 
Noted

> > +       if (unlikely(err)) {
> > +               if (log_ecn_error) {
> > +                       if (bareudp_get_sk_family(bs) == AF_INET)
> > +                               net_info_ratelimited("non-ECT from %pI4 "
> > +                                                    "with TOS=%#x\n",
> > +                                                    &((struct iphdr *)oiph)->saddr,
> > +                                                    ((struct iphdr *)oiph)->tos);
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +                       else
> > +                               net_info_ratelimited("non-ECT from %pI6\n",
> > +                                                    &((struct ipv6hdr *)oiph)->saddr);
> > +#endif
> > +               }
> > +               if (err > 1) {
> > +                       ++bareudp->dev->stats.rx_frame_errors;
> > +                       ++bareudp->dev->stats.rx_errors;
> > +                       goto drop;
> > +               }
> > +       }
> > +
> > +       len = skb->len;
> > +       err = gro_cells_receive(&bareudp->gro_cells, skb);
> > +       if (likely(err == NET_RX_SUCCESS)) {
> > +               stats = this_cpu_ptr(bareudp->dev->tstats);
> > +               u64_stats_update_begin(&stats->syncp);
> > +               stats->rx_packets++;
> > +               stats->rx_bytes += len;
> > +               u64_stats_update_end(&stats->syncp);
> > +       }
> > +       return 0;
> > +drop:
> > +       /* Consume bad packet */
> > +       kfree_skb(skb);
> > +
> > +       return 0;
> > +}
> 
> All of this can pretty easily be shared with geneve.
>
if we remove the bareudp_get_sk_family(bs) check.we may mot
have lot of common code.will check. 

> > +static struct socket *bareudp_create_sock(struct net *net, bool ipv6,
> > +                                         __be16 port)
> > +{
> > +       struct socket *sock;
> > +       struct udp_port_cfg udp_conf;
> > +       int err;
> > +
> > +       memset(&udp_conf, 0, sizeof(udp_conf));
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +       udp_conf.family = AF_INET6;
> > +#else
> > +       udp_conf.family = AF_INET;
> > +#endif
> 
> Still need to take bool ipv6 in account when creating the socket?
> 
noted
> > +static int bareudp_sock_add(struct bareudp_dev *bareudp, bool ipv6)
> > +{
> > +       struct net *net = bareudp->net;
> > +       struct bareudp_sock *bs;
> > +
> > +       bs = bareudp_socket_create(net, bareudp->conf.port, ipv6);
> > +       if (IS_ERR(bs))
> > +               return PTR_ERR(bs);
> > +
> > +       rcu_assign_pointer(bareudp->sock, bs);
> > +       bs->bareudp = bareudp;
> > +
> > +       return 0;
> 
> Especially now that the device only has one socket, probably no need
> to have a separate bareudp_socket_create from bareudp_sock_add.  Same
> for __bareudp_sock_release and bareudp_sock_release.
>
NOted 
> > +}
> > +
> > +static void __bareudp_sock_release(struct bareudp_sock *bs)
> > +{
> > +       if (!bs)
> > +               return;
> > +
> > +       udp_tunnel_sock_release(bs->sock);
> > +       kfree_rcu(bs, rcu);
> > +}
> > +
> > +static void bareudp_sock_release(struct bareudp_dev *bareudp)
> > +{
> > +       struct bareudp_sock *bs = rtnl_dereference(bareudp->sock);
> > +
> > +       rcu_assign_pointer(bareudp->sock, NULL);
> > +       synchronize_net();
> > +       __bareudp_sock_release(bs);
> > +}
> > +
> > +static int bareudp_fill_metadata_dst(struct net_device *dev,
> > +                                    struct sk_buff *skb)
> > +{
> > +       struct ip_tunnel_info *info = skb_tunnel_info(skb);
> > +       struct bareudp_dev *bareudp = netdev_priv(dev);
> > +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > +
> > +       if (ip_tunnel_info_af(info) == AF_INET) {
> > +               struct rtable *rt;
> > +               struct flowi4 fl4;
> > +
> > +               rt = iptunnel_get_v4_rt(skb, dev, bareudp->net, &fl4, info,
> > +                                       use_cache);
> > +               if (IS_ERR(rt))
> > +                       return PTR_ERR(rt);
> > +
> > +               ip_rt_put(rt);
> > +               info->key.u.ipv4.src = fl4.saddr;
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +       } else if (ip_tunnel_info_af(info) == AF_INET6) {
> > +               struct dst_entry *dst;
> > +               struct flowi6 fl6;
> > +               struct bareudp_sock *bs6 = rcu_dereference(bareudp->sock);
> > +
> > +               dst = ip6tunnel_get_dst(skb, dev, bareudp->net, bs6->sock, &fl6,
> > +                                       info, use_cache);
> > +               if (IS_ERR(dst))
> > +                       return PTR_ERR(dst);
> > +
> > +               dst_release(dst);
> > +               info->key.u.ipv6.src = fl6.saddr;
> > +#endif
> > +       } else {
> > +               return -EINVAL;
> > +       }
> > +
> > +       info->key.tp_src = udp_flow_src_port(bareudp->net, skb,
> > +                                            bareudp->sport_min,
> > +                                            USHRT_MAX, true);
> > +       info->key.tp_dst = bareudp->conf.port;
> > +       return 0;
> > +}
> 
> This can probably all be deduplicated with geneve_fill_metadata_dst
> once both use iptunnel_get_v4_rt.
>

Do you have any preference of file to keep the common function 
> 
> > +static netdev_tx_t bareudp_xmit(struct sk_buff *skb, struct net_device *dev)
> > +{
> > +       struct bareudp_dev *bareudp = netdev_priv(dev);
> > +       struct ip_tunnel_info *info = NULL;
> > +       int err;
> > +
> > +       if (skb->protocol != bareudp->ethertype) {
> > +               err = -EINVAL;
> > +               goto tx_error;
> > +       }
> > +
> > +       info = skb_tunnel_info(skb);
> 
> Like geneve, should this have two modes, either associated with LWT if
> geneve->collect_md, else static for the device (geneve->info)? As
> opposed to requiring collect_md mode.
> 
This module will be used with OVS mostly and so it need not
support static mode.

> > +/* Initialize the device structure. */
> > +static void bareudp_setup(struct net_device *dev)
> > +{
> > +       ether_setup(dev);
> > +
> > +       dev->netdev_ops = &bareudp_netdev_ops;
> > +       dev->ethtool_ops = &bareudp_ethtool_ops;
> > +       dev->needs_free_netdev = true;
> > +
> > +       SET_NETDEV_DEVTYPE(dev, &bareudp_type);
> > +
> > +       dev->features    |= NETIF_F_SG | NETIF_F_HW_CSUM;
> > +       dev->features    |= NETIF_F_RXCSUM;
> > +       dev->features    |= NETIF_F_GSO_SOFTWARE;
> > +
> > +       dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
> > +       dev->hw_features |= NETIF_F_GSO_SOFTWARE;
> > +
> > +       /* MTU range: 68 - (something less than 65535) */
> > +       dev->min_mtu = ETH_MIN_MTU;
> > +       dev->max_mtu = IP_MAX_MTU - BAREUDP_BASE_HLEN - dev->hard_header_len;
> 
> For IPv6, MSS is 64 kB without headers, so dev->max_mtu could perhaps
> be IP_MAX_MTU. Also, why is hard_header_len subtracted? Again, same
> for geneve, so if it works there, fine to leave here. Just something
> that surprised me and might be worth giving another thought.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ