[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTScS+fm_scnm5qkU4wtV+FAW8XkC4OfwCbLOxuPz1YipNw@mail.gmail.com>
Date: Thu, 17 Oct 2019 16:48:26 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Martin Varghese <martinvarghesenokia@...il.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>, corbet@....net,
scott.drennan@...ia.com, Jiri Benc <jbenc@...hat.com>,
martin.varghese@...ia.com
Subject: Re: [PATCH net-next 1/2] UDP tunnel encapsulation module for
tunnelling different protocols like MPLS,IP,NSH etc.
On Thu, Oct 17, 2019 at 9:20 AM Martin Varghese
<martinvarghesenokia@...il.com> wrote:
>
> On Tue, Oct 08, 2019 at 12:28:23PM -0400, Willem de Bruijn wrote:
> > On Tue, Oct 8, 2019 at 5:51 AM Martin Varghese
> > <martinvarghesenokia@...il.com> wrote:
> > >
> > > From: Martin <martin.varghese@...ia.com>
> > >
> > > The Bareudp tunnel module provides a generic L3 encapsulation
> > > tunnelling module for tunnelling different protocols like MPLS,
> > > IP,NSH etc inside a UDP tunnel.
> > >
> > > Signed-off-by: Martin Varghese <martinvarghesenokia@...il.com>
> > > ---
> > > +static int bareudp_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> > > +{
> >
> >
> > > + skb_push(skb, sizeof(struct ethhdr));
> > > + eh = (struct ethhdr *)skb->data;
> > > + eh->h_proto = proto;
> > > +
> > > + skb_reset_mac_header(skb);
> > > + skb->protocol = eth_type_trans(skb, bareudp->dev);
> > > + skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> > > + oiph = skb_network_header(skb);
> > > + skb_reset_network_header(skb);
> > > +
> > > + if (bareudp_get_sk_family(bs) == AF_INET)
> >
> > This should be derived from packet contents, not socket state.
> > Although the one implies the other, I imagine.
> >
>
> The IP Stack check IP headers & puts the packet in the correct socket, hence checking the ip headers again is reduntant correct?
This parses the inner packet after decapsulation. The protocol stack
has selected the socket based on the outer packet, right?
I guess the correctness comes from the administrator having configured
the bareudp for this protocol type, so implicitly guarantees that no
other inner packets will appear.
Also, the oiph pointer is a bit fragile now that a new mac header is
constructed in the space that used to hold the encapsulation headers.
I suppose it only updates eth->h_proto, which lies in the former udp
header. More fundamentally, is moving the mac header needed at all, if
the stack correctly uses skb_mac_header whenever it accesses also
after decapsulation?
> In geneve & vxlan it is done the same way.
>
>
> > > +static struct rtable *bareudp_get_v4_rt(struct sk_buff *skb,
> > > + struct net_device *dev,
> > > + struct bareudp_sock *bs4,
> > > + struct flowi4 *fl4,
> > > + const struct ip_tunnel_info *info)
> > > +{
> > > + bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > > + struct bareudp_dev *bareudp = netdev_priv(dev);
> > > + struct dst_cache *dst_cache;
> > > + struct rtable *rt = NULL;
> > > + __u8 tos;
> > > +
> > > + if (!bs4)
> > > + return ERR_PTR(-EIO);
> > > +
> > > + memset(fl4, 0, sizeof(*fl4));
> > > + fl4->flowi4_mark = skb->mark;
> > > + fl4->flowi4_proto = IPPROTO_UDP;
> > > + fl4->daddr = info->key.u.ipv4.dst;
> > > + fl4->saddr = info->key.u.ipv4.src;
> > > +
> > > + tos = info->key.tos;
> > > + fl4->flowi4_tos = RT_TOS(tos);
> > > +
> > > + dst_cache = (struct dst_cache *)&info->dst_cache;
> > > + if (use_cache) {
> > > + rt = dst_cache_get_ip4(dst_cache, &fl4->saddr);
> > > + if (rt)
> > > + return rt;
> > > + }
> > > + rt = ip_route_output_key(bareudp->net, fl4);
> > > + if (IS_ERR(rt)) {
> > > + netdev_dbg(dev, "no route to %pI4\n", &fl4->daddr);
> > > + return ERR_PTR(-ENETUNREACH);
> > > + }
> > > + if (rt->dst.dev == dev) { /* is this necessary? */
> > > + netdev_dbg(dev, "circular route to %pI4\n", &fl4->daddr);
> > > + ip_rt_put(rt);
> > > + return ERR_PTR(-ELOOP);
> > > + }
> > > + if (use_cache)
> > > + dst_cache_set_ip4(dst_cache, &rt->dst, fl4->saddr);
> > > + return rt;
> > > +}
> > > +
> > > +#if IS_ENABLED(CONFIG_IPV6)
> > > +static struct dst_entry *bareudp_get_v6_dst(struct sk_buff *skb,
> > > + struct net_device *dev,
> > > + struct bareudp_sock *bs6,
> > > + struct flowi6 *fl6,
> > > + const struct ip_tunnel_info *info)
> > > +{
> > > + bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
> > > + struct bareudp_dev *bareudp = netdev_priv(dev);
> > > + struct dst_entry *dst = NULL;
> > > + struct dst_cache *dst_cache;
> > > + __u8 prio;
> > > +
> > > + if (!bs6)
> > > + return ERR_PTR(-EIO);
> > > +
> > > + memset(fl6, 0, sizeof(*fl6));
> > > + fl6->flowi6_mark = skb->mark;
> > > + fl6->flowi6_proto = IPPROTO_UDP;
> > > + fl6->daddr = info->key.u.ipv6.dst;
> > > + fl6->saddr = info->key.u.ipv6.src;
> > > + prio = info->key.tos;
> > > +
> > > + fl6->flowlabel = ip6_make_flowinfo(RT_TOS(prio),
> > > + info->key.label);
> > > + dst_cache = (struct dst_cache *)&info->dst_cache;
> > > + if (use_cache) {
> > > + dst = dst_cache_get_ip6(dst_cache, &fl6->saddr);
> > > + if (dst)
> > > + return dst;
> > > + }
> > > + if (ipv6_stub->ipv6_dst_lookup(bareudp->net, bs6->sock->sk, &dst,
> > > + fl6)) {
> > > + netdev_dbg(dev, "no route to %pI6\n", &fl6->daddr);
> > > + return ERR_PTR(-ENETUNREACH);
> > > + }
> > > + if (dst->dev == dev) { /* is this necessary? */
> > > + netdev_dbg(dev, "circular route to %pI6\n", &fl6->daddr);
> > > + dst_release(dst);
> > > + return ERR_PTR(-ELOOP);
> > > + }
> > > +
> > > + if (use_cache)
> > > + dst_cache_set_ip6(dst_cache, dst, &fl6->saddr);
> > > + return dst;
> > > +}
> > > +#endif
> >
> > The route lookup logic is very similar to vxlan_get_route and
> > vxlan6_get_route. Can be reused?
>
> I had a look at the vxlan & geneve and it seems the corresponding functions in those modules are tightly coupled to the rest of the module design.
> More specifically wrt the ttl inheritance & the caching behaviour. It may not be possible for those modules to use a new generic API unless without a change in those module design.
bareudp_get_v4_rt is identical to geneve_get_v4_rt down to the comment
aside from
if ((tos == 1) && !geneve->collect_md) {
tos = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
use_cache = false;
}
Same for bareudp_get_v6_dst and geneve_get_v6_dst.
Worst case that one branch could be made conditional on a boolean
argument? Maybe this collect_md part (eventually) makes sense to
bareudp, as well.
> The bareudp module is a generic L3 encapsulation module. It could be used to tunnel different l3 protocols. TTL Inheritance behaviour when tunnelled
> could be different for these inner protocols. Hence moving this function to a common place will make it tough to change it later when a need arises for a new protocol
>
> Otherwise we should have more generic function which takes the generic IP header params as arguments. Then the point is we don’t need a function like that
> We can just fill up "struct flowi4" and call ip_route_output_key or dst_cache_get_ip4 to get the route table entry
>
>
> Thanks
> Martin
Powered by blists - more mailing lists