[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191128162427.GB2633@martin-VirtualBox>
Date: Thu, 28 Nov 2019 21:54:27 +0530
From: Martin Varghese <martinvarghesenokia@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>, corbet@....net,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
scott.drennan@...ia.com, Jiri Benc <jbenc@...hat.com>,
martin.varghese@...ia.com
Subject: Re: [PATCH v3 net-next 1/2] UDP tunnel encapsulation module for
tunnelling different protocols like MPLS,IP,NSH etc.
On Mon, Nov 18, 2019 at 12:23:09PM -0500, Willem de Bruijn wrote:
> On Sat, Nov 16, 2019 at 12:45 AM Martin Varghese
> <martinvarghesenokia@...il.com> wrote:
> >
> > From: Martin Varghese <martin.varghese@...ia.com>
> >
> > The Bareudp tunnel module provides a generic L3 encapsulation
> > tunnelling module for tunnelling different protocols like MPLS,
> > IP,NSH etc inside a UDP tunnel.
> >
> > Signed-off-by: Martin Varghese <martin.varghese@...ia.com>
> > ---
> > Changes in v2:
> > - Fixed documentation errors.
> > - Converted documentation to rst format.
> > - Moved ip tunnel rt lookup code to a common location.
>
> This does not actually deduplicate with the existing code in geneve
> (and vxlan, maybe others).
>
> Mentioned a few more obvious examples inline. I won't keep harping on
> this point, as I have no real stake in this.
>
> But if you don't deduplicate from the start, I'm skeptical that it
> will happen later. And then fixes to one driver are likely to be
> missed for the other, slowly drifting the code further apart.
>
Adaptation of geneve module to use the generic API will be in a
seperate patch series.
> > +1) Device creation & deletion
> > +
> > + a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847.
> > +
> > + This creates a bareudp tunnel device which tunnels L3 traffic with ethertype
> > + 0x8847 (MPLS traffic). The destination port of the UDP header will be set to
> > + 6635.The device will listen on UDP port 6635 to receive traffic.
> > +
>
> Might be useful to also document how to use this on transmission.
>
> > +struct bareudp_sock {
> > + struct socket *sock;
> > + struct rcu_head rcu;
> > + struct bareudp_dev *bareudp;
> > +};
>
> Is this intermediate struct needed, or could struct bareudp_dev
> directly hang off of sk_user_data if it gains a struct rcu_head?
> Especially now that there is a 1:1 relationship between socket and
> device.
>
will check.
> > + skb_push(skb, sizeof(struct ethhdr));
> > + skb_reset_mac_header(skb);
> > +
> > + err = skb_ensure_writable(skb, sizeof(struct ethhdr));
> > + if (unlikely(err))
> > + goto drop;
> > +
> > + eh = (struct ethhdr *)skb->data;
> > + eh->h_proto = proto;
> > + skb->protocol = eth_type_trans(skb, bareudp->dev);
> > + oiph = skb_network_header(skb);
> > + skb_reset_network_header(skb);
>
> After decapsulation packets now have a fake ethernet header with
> garbage in the src and dst address. Tcpdump will show that.
>
> Instead of pseudo Ethernet, should this device be of link layer type
> ARPHRD_NONE. Or alternatively a new type analogous to ARPHRD_TUNNEL
> that includes the udp header?
>
> The same can probably be said about geneve, so no strong objection to
> what is here. Just a point for thought.
>
I am pondering more on this comment.
> > + if (bareudp_get_sk_family(bs) == AF_INET)
> > + err = IP_ECN_decapsulate(oiph, skb);
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + else
> > + err = IP6_ECN_decapsulate(oiph, skb);
> > +#endif
>
> with dual stack sockets, should this check now be based on packet
> header instead of bareudp_get_sk_family?
>
Noted
> > + if (unlikely(err)) {
> > + if (log_ecn_error) {
> > + if (bareudp_get_sk_family(bs) == AF_INET)
> > + net_info_ratelimited("non-ECT from %pI4 "
> > + "with TOS=%#x\n",
> > + &((struct iphdr *)oiph)->saddr,
> > + ((struct iphdr *)oiph)->tos);
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + else
> > + net_info_ratelimited("non-ECT from %pI6\n",
> > + &((struct ipv6hdr *)oiph)->saddr);
> > +#endif
> > + }
> > + if (err > 1) {
> > + ++bareudp->dev->stats.rx_frame_errors;
> > + ++bareudp->dev->stats.rx_errors;
> > + goto drop;
> > + }
> > + }
> > +
> > + len = skb->len;
> > + err = gro_cells_receive(&bareudp->gro_cells, skb);
> > + if (likely(err == NET_RX_SUCCESS)) {
> > + stats = this_cpu_ptr(bareudp->dev->tstats);
> > + u64_stats_update_begin(&stats->syncp);
> > + stats->rx_packets++;
> > + stats->rx_bytes += len;
> > + u64_stats_update_end(&stats->syncp);
> > + }
> > + return 0;
> > +drop:
> > + /* Consume bad packet */
> > + kfree_skb(skb);
> > +
> > + return 0;
> > +}
>
> All of this can pretty easily be shared with geneve.
>
if we remove the bareudp_get_sk_family(bs) check.we may mot
have lot of common code.will check.
> > +static struct socket *bareudp_create_sock(struct net *net, bool ipv6,
> > + __be16 port)
> > +{
> > + struct socket *sock;
> > + struct udp_port_cfg udp_conf;
> > + int err;
> > +
> > + memset(&udp_conf, 0, sizeof(udp_conf));
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + udp_conf.family = AF_INET6;
> > +#else
> > + udp_conf.family = AF_INET;
> > +#endif
>
> Still need to take bool ipv6 in account when creating the socket?
>
noted
> > +static int bareudp_sock_add(struct bareudp_dev *bareudp, bool ipv6)
> > +{
> > + struct net *net = bareudp->net;
> > + struct bareudp_sock *bs;
> > +
> > + bs = bareudp_socket_create(net, bareudp->conf.port, ipv6);
> > + if (IS_ERR(bs))
> > + return PTR_ERR(bs);
> > +
> > + rcu_assign_pointer(bareudp->sock, bs);
> > + bs->bareudp = bareudp;
> > +
> > + return 0;
>
> Especially now that the device only has one socket, probably no need
> to have a separate bareudp_socket_create from bareudp_sock_add. Same
> for __bareudp_sock_release and bareudp_sock_release.
>
NOted
> > +}
> > +
> > +static void __bareudp_sock_release(struct bareudp_sock *bs)
> > +{
> > + if (!bs)
> > + return;
> > +
> > + udp_tunnel_sock_release(bs->sock);
> > + kfree_rcu(bs, rcu);
> > +}
> > +
> > +static void bareudp_sock_release(struct bareudp_dev *bareudp)
> > +{
> > + struct bareudp_sock *bs = rtnl_dereference(bareudp->sock);
> > +
> > + rcu_assign_pointer(bareudp->sock, NULL);
> > + synchronize_net();
> > + __bareudp_sock_release(bs);
> > +}
> > +
> > +static int bareudp_fill_metadata_dst(struct net_device *dev,
> > + struct sk_buff *skb)
> > +{
> > + struct ip_tunnel_info *info = skb_tunnel_info(skb);
> > + struct bareudp_dev *bareudp = netdev_priv(dev);
> > + bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > +
> > + if (ip_tunnel_info_af(info) == AF_INET) {
> > + struct rtable *rt;
> > + struct flowi4 fl4;
> > +
> > + rt = iptunnel_get_v4_rt(skb, dev, bareudp->net, &fl4, info,
> > + use_cache);
> > + if (IS_ERR(rt))
> > + return PTR_ERR(rt);
> > +
> > + ip_rt_put(rt);
> > + info->key.u.ipv4.src = fl4.saddr;
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + } else if (ip_tunnel_info_af(info) == AF_INET6) {
> > + struct dst_entry *dst;
> > + struct flowi6 fl6;
> > + struct bareudp_sock *bs6 = rcu_dereference(bareudp->sock);
> > +
> > + dst = ip6tunnel_get_dst(skb, dev, bareudp->net, bs6->sock, &fl6,
> > + info, use_cache);
> > + if (IS_ERR(dst))
> > + return PTR_ERR(dst);
> > +
> > + dst_release(dst);
> > + info->key.u.ipv6.src = fl6.saddr;
> > +#endif
> > + } else {
> > + return -EINVAL;
> > + }
> > +
> > + info->key.tp_src = udp_flow_src_port(bareudp->net, skb,
> > + bareudp->sport_min,
> > + USHRT_MAX, true);
> > + info->key.tp_dst = bareudp->conf.port;
> > + return 0;
> > +}
>
> This can probably all be deduplicated with geneve_fill_metadata_dst
> once both use iptunnel_get_v4_rt.
>
Do you have any preference of file to keep the common function
>
> > +static netdev_tx_t bareudp_xmit(struct sk_buff *skb, struct net_device *dev)
> > +{
> > + struct bareudp_dev *bareudp = netdev_priv(dev);
> > + struct ip_tunnel_info *info = NULL;
> > + int err;
> > +
> > + if (skb->protocol != bareudp->ethertype) {
> > + err = -EINVAL;
> > + goto tx_error;
> > + }
> > +
> > + info = skb_tunnel_info(skb);
>
> Like geneve, should this have two modes, either associated with LWT if
> geneve->collect_md, else static for the device (geneve->info)? As
> opposed to requiring collect_md mode.
>
This module will be used with OVS mostly and so it need not
support static mode.
> > +/* Initialize the device structure. */
> > +static void bareudp_setup(struct net_device *dev)
> > +{
> > + ether_setup(dev);
> > +
> > + dev->netdev_ops = &bareudp_netdev_ops;
> > + dev->ethtool_ops = &bareudp_ethtool_ops;
> > + dev->needs_free_netdev = true;
> > +
> > + SET_NETDEV_DEVTYPE(dev, &bareudp_type);
> > +
> > + dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;
> > + dev->features |= NETIF_F_RXCSUM;
> > + dev->features |= NETIF_F_GSO_SOFTWARE;
> > +
> > + dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
> > + dev->hw_features |= NETIF_F_GSO_SOFTWARE;
> > +
> > + /* MTU range: 68 - (something less than 65535) */
> > + dev->min_mtu = ETH_MIN_MTU;
> > + dev->max_mtu = IP_MAX_MTU - BAREUDP_BASE_HLEN - dev->hard_header_len;
>
> For IPv6, MSS is 64 kB without headers, so dev->max_mtu could perhaps
> be IP_MAX_MTU. Also, why is hard_header_len subtracted? Again, same
> for geneve, so if it works there, fine to leave here. Just something
> that surprised me and might be worth giving another thought.
Powered by blists - more mailing lists