lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSeUGsWH-GR7N_N7PChaW4S6Hucmvo_1s_9bbisxz1eOAA@mail.gmail.com>
Date:   Mon, 18 Nov 2019 12:23:09 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Martin Varghese <martinvarghesenokia@...il.com>
Cc:     Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>, corbet@....net,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        scott.drennan@...ia.com, Jiri Benc <jbenc@...hat.com>,
        martin.varghese@...ia.com
Subject: Re: [PATCH v3 net-next 1/2] UDP tunnel encapsulation module for
 tunnelling different protocols like MPLS,IP,NSH etc.

On Sat, Nov 16, 2019 at 12:45 AM Martin Varghese
<martinvarghesenokia@...il.com> wrote:
>
> From: Martin Varghese <martin.varghese@...ia.com>
>
> The Bareudp tunnel module provides a generic L3 encapsulation
> tunnelling module for tunnelling different protocols like MPLS,
> IP,NSH etc inside a UDP tunnel.
>
> Signed-off-by: Martin Varghese <martin.varghese@...ia.com>
> ---
> Changes in v2:
>      - Fixed documentation errors.
>      - Converted documentation to rst format.
>      - Moved ip tunnel rt lookup code to a common location.

This does not actually deduplicate with the existing code in geneve
(and vxlan, maybe others).

Mentioned a few more obvious examples inline. I won't keep harping on
this point, as I have no real stake in this.

But if you don't deduplicate from the start, I'm skeptical that it
will happen later. And then fixes to one driver are likely to be
missed for the other, slowly drifting the code further apart.

> +1) Device creation & deletion
> +
> +    a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847.
> +
> +       This creates a bareudp tunnel device which tunnels L3 traffic with ethertype
> +       0x8847 (MPLS traffic). The destination port of the UDP header will be set to
> +       6635.The device will listen on UDP port 6635 to receive traffic.
> +

Might be useful to also document how to use this on transmission.

> +struct bareudp_sock {
> +       struct socket           *sock;
> +       struct rcu_head         rcu;
> +       struct bareudp_dev      *bareudp;
> +};

Is this intermediate struct needed, or could struct bareudp_dev
directly hang off of sk_user_data if it gains a struct rcu_head?
Especially now that there is a 1:1 relationship between socket and
device.

> +       skb_push(skb, sizeof(struct ethhdr));
> +       skb_reset_mac_header(skb);
> +
> +       err = skb_ensure_writable(skb, sizeof(struct ethhdr));
> +       if (unlikely(err))
> +               goto drop;
> +
> +       eh = (struct ethhdr *)skb->data;
> +       eh->h_proto = proto;
> +       skb->protocol = eth_type_trans(skb, bareudp->dev);
> +       oiph = skb_network_header(skb);
> +       skb_reset_network_header(skb);

After decapsulation packets now have a fake ethernet header with
garbage in the src and dst address. Tcpdump will show that.

Instead of pseudo Ethernet, should this device be of link layer type
ARPHRD_NONE. Or alternatively a new type analogous to ARPHRD_TUNNEL
that includes the udp header?

The same can probably be said about geneve, so no strong objection to
what is here. Just a point for thought.

> +       if (bareudp_get_sk_family(bs) == AF_INET)
> +               err = IP_ECN_decapsulate(oiph, skb);
> +#if IS_ENABLED(CONFIG_IPV6)
> +       else
> +               err = IP6_ECN_decapsulate(oiph, skb);
> +#endif

with dual stack sockets, should this check now be based on packet
header instead of bareudp_get_sk_family?

> +       if (unlikely(err)) {
> +               if (log_ecn_error) {
> +                       if (bareudp_get_sk_family(bs) == AF_INET)
> +                               net_info_ratelimited("non-ECT from %pI4 "
> +                                                    "with TOS=%#x\n",
> +                                                    &((struct iphdr *)oiph)->saddr,
> +                                                    ((struct iphdr *)oiph)->tos);
> +#if IS_ENABLED(CONFIG_IPV6)
> +                       else
> +                               net_info_ratelimited("non-ECT from %pI6\n",
> +                                                    &((struct ipv6hdr *)oiph)->saddr);
> +#endif
> +               }
> +               if (err > 1) {
> +                       ++bareudp->dev->stats.rx_frame_errors;
> +                       ++bareudp->dev->stats.rx_errors;
> +                       goto drop;
> +               }
> +       }
> +
> +       len = skb->len;
> +       err = gro_cells_receive(&bareudp->gro_cells, skb);
> +       if (likely(err == NET_RX_SUCCESS)) {
> +               stats = this_cpu_ptr(bareudp->dev->tstats);
> +               u64_stats_update_begin(&stats->syncp);
> +               stats->rx_packets++;
> +               stats->rx_bytes += len;
> +               u64_stats_update_end(&stats->syncp);
> +       }
> +       return 0;
> +drop:
> +       /* Consume bad packet */
> +       kfree_skb(skb);
> +
> +       return 0;
> +}

All of this can pretty easily be shared with geneve.

> +static struct socket *bareudp_create_sock(struct net *net, bool ipv6,
> +                                         __be16 port)
> +{
> +       struct socket *sock;
> +       struct udp_port_cfg udp_conf;
> +       int err;
> +
> +       memset(&udp_conf, 0, sizeof(udp_conf));
> +#if IS_ENABLED(CONFIG_IPV6)
> +       udp_conf.family = AF_INET6;
> +#else
> +       udp_conf.family = AF_INET;
> +#endif

Still need to take bool ipv6 in account when creating the socket?

> +static int bareudp_sock_add(struct bareudp_dev *bareudp, bool ipv6)
> +{
> +       struct net *net = bareudp->net;
> +       struct bareudp_sock *bs;
> +
> +       bs = bareudp_socket_create(net, bareudp->conf.port, ipv6);
> +       if (IS_ERR(bs))
> +               return PTR_ERR(bs);
> +
> +       rcu_assign_pointer(bareudp->sock, bs);
> +       bs->bareudp = bareudp;
> +
> +       return 0;

Especially now that the device only has one socket, probably no need
to have a separate bareudp_socket_create from bareudp_sock_add.  Same
for __bareudp_sock_release and bareudp_sock_release.

> +}
> +
> +static void __bareudp_sock_release(struct bareudp_sock *bs)
> +{
> +       if (!bs)
> +               return;
> +
> +       udp_tunnel_sock_release(bs->sock);
> +       kfree_rcu(bs, rcu);
> +}
> +
> +static void bareudp_sock_release(struct bareudp_dev *bareudp)
> +{
> +       struct bareudp_sock *bs = rtnl_dereference(bareudp->sock);
> +
> +       rcu_assign_pointer(bareudp->sock, NULL);
> +       synchronize_net();
> +       __bareudp_sock_release(bs);
> +}
> +
> +static int bareudp_fill_metadata_dst(struct net_device *dev,
> +                                    struct sk_buff *skb)
> +{
> +       struct ip_tunnel_info *info = skb_tunnel_info(skb);
> +       struct bareudp_dev *bareudp = netdev_priv(dev);
> +       bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> +
> +       if (ip_tunnel_info_af(info) == AF_INET) {
> +               struct rtable *rt;
> +               struct flowi4 fl4;
> +
> +               rt = iptunnel_get_v4_rt(skb, dev, bareudp->net, &fl4, info,
> +                                       use_cache);
> +               if (IS_ERR(rt))
> +                       return PTR_ERR(rt);
> +
> +               ip_rt_put(rt);
> +               info->key.u.ipv4.src = fl4.saddr;
> +#if IS_ENABLED(CONFIG_IPV6)
> +       } else if (ip_tunnel_info_af(info) == AF_INET6) {
> +               struct dst_entry *dst;
> +               struct flowi6 fl6;
> +               struct bareudp_sock *bs6 = rcu_dereference(bareudp->sock);
> +
> +               dst = ip6tunnel_get_dst(skb, dev, bareudp->net, bs6->sock, &fl6,
> +                                       info, use_cache);
> +               if (IS_ERR(dst))
> +                       return PTR_ERR(dst);
> +
> +               dst_release(dst);
> +               info->key.u.ipv6.src = fl6.saddr;
> +#endif
> +       } else {
> +               return -EINVAL;
> +       }
> +
> +       info->key.tp_src = udp_flow_src_port(bareudp->net, skb,
> +                                            bareudp->sport_min,
> +                                            USHRT_MAX, true);
> +       info->key.tp_dst = bareudp->conf.port;
> +       return 0;
> +}

This can probably all be deduplicated with geneve_fill_metadata_dst
once both use iptunnel_get_v4_rt.


> +static netdev_tx_t bareudp_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +       struct bareudp_dev *bareudp = netdev_priv(dev);
> +       struct ip_tunnel_info *info = NULL;
> +       int err;
> +
> +       if (skb->protocol != bareudp->ethertype) {
> +               err = -EINVAL;
> +               goto tx_error;
> +       }
> +
> +       info = skb_tunnel_info(skb);

Like geneve, should this have two modes, either associated with LWT if
geneve->collect_md, else static for the device (geneve->info)? As
opposed to requiring collect_md mode.

> +/* Initialize the device structure. */
> +static void bareudp_setup(struct net_device *dev)
> +{
> +       ether_setup(dev);
> +
> +       dev->netdev_ops = &bareudp_netdev_ops;
> +       dev->ethtool_ops = &bareudp_ethtool_ops;
> +       dev->needs_free_netdev = true;
> +
> +       SET_NETDEV_DEVTYPE(dev, &bareudp_type);
> +
> +       dev->features    |= NETIF_F_SG | NETIF_F_HW_CSUM;
> +       dev->features    |= NETIF_F_RXCSUM;
> +       dev->features    |= NETIF_F_GSO_SOFTWARE;
> +
> +       dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
> +       dev->hw_features |= NETIF_F_GSO_SOFTWARE;
> +
> +       /* MTU range: 68 - (something less than 65535) */
> +       dev->min_mtu = ETH_MIN_MTU;
> +       dev->max_mtu = IP_MAX_MTU - BAREUDP_BASE_HLEN - dev->hard_header_len;

For IPv6, MSS is 64 kB without headers, so dev->max_mtu could perhaps
be IP_MAX_MTU. Also, why is hard_header_len subtracted? Again, same
for geneve, so if it works there, fine to leave here. Just something
that surprised me and might be worth giving another thought.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ