lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 5 Mar 2015 08:36:58 -0800
From:	Vivek Venkatraman <vivek@...ulusnetworks.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
	roopa <roopa@...ulusnetworks.com>,
	Stephen Hemminger <stephen@...workplumber.org>,
	santiago@...reenet.org, Simon Horman <horms@...ge.net.au>
Subject: Re: [PATCH net-next 2/7] mpls: Basic routing support

On Tue, Mar 3, 2015 at 5:10 PM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> This change adds a new Kconfig option MPLS_ROUTING.
>
> The core of this change is the code to look at an mpls packet received
> from another machine.  Look that packet up in a routing table and
> forward the packet on.
>
> Support of MPLS over ATM is not considered or attempted here.  This
> implemntation follows RFC3032 and implements the MPLS shim header that
> can pass over essentially any network.
>
> What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
> net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
> Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
> label fordwarding information base (lfib) might also be valid.
>

This currently does not allow for ECMP when acting as a transit, correct?

> Further the implemntation forwards packets as described in RFC3032.
> There is no need and given the original motivation for MPLS a strong
> discincentive to have a flexible label forwarding path.  In essence
> the logic is the topmost label is read, looked up, removed, and
> replaced by 0 or more new lables and the sent out the specified
> interface to it's next hop.
>
> Quite a few optional features are not implemented here.  Among them
> are generation of ICMP errors when the TTL is exceeded or the packet
> is larger than the next hop MTU (those conditions are detected and the
> packets are dropped instead of generating an icmp error).  The traffic
> class field is always set to 0.  The implementation focuses on IP over
> MPLS and does not handle egress of other kinds of protocols.
>
> Instead of implementing coordination with the neighbour table and
> sorting out how to input next hops in a different address family (for
> which there is value).  I was lazy and implemented a next hop mac
> address instead.  The code is simpler and there are flavor of MPLS
> such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
> appropriate so a next hop by mac address would need to be implemented
> at some point.
>

I guess the above is no longer the case with this revised patch which
can support a IPv4 or IPv6 next hop too, right?

> Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.
>
> Decoding the mpls header must be done by first byeswapping a 32bit bit
> endian word into the local cpu endian and then bit shifting to extract
> the pieces.  There is no C bit-field that can represent a wire format
> mpls header on a little endian machine as the low bits of the 20bit
> label wind up in the wrong half of third byte.  Therefore internally
> everything is deal with in cpu native byte order except when writing
> to and reading from a packet.
>
> For management simplicity if a label is configured to forward out
> an interface that is down the packet is dropped early.  Similarly
> if an network interface is removed rt_dev is updated to NULL
> (so no reference is preserved) and any packets for that label
> are dropped.  Keeping the label entries in the kernel allows
> the kernel label table to function as the definitive source
> of which labels are allocated and which are not.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
> ---
>  include/linux/socket.h      |   2 +
>  include/net/net_namespace.h |   4 +
>  include/net/netns/mpls.h    |  15 ++
>  net/mpls/Kconfig            |   5 +
>  net/mpls/Makefile           |   1 +
>  net/mpls/af_mpls.c          | 349 ++++++++++++++++++++++++++++++++++++++++++++
>  net/mpls/internal.h         |  56 +++++++
>  7 files changed, 432 insertions(+)
>  create mode 100644 include/net/netns/mpls.h
>  create mode 100644 net/mpls/af_mpls.c
>  create mode 100644 net/mpls/internal.h
>
> <snip>
> +
> +static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
> +                       struct packet_type *pt, struct net_device *orig_dev)
> +{
> +       struct net *net = dev_net(dev);
> +       struct mpls_shim_hdr *hdr;
> +       struct mpls_route *rt;
> +       struct mpls_entry_decoded dec;
> +       struct net_device *out_dev;
> +       unsigned int hh_len;
> +       unsigned int new_header_size;
> +       unsigned int mtu;
> +       int err;
> +
> +       /* Careful this entire function runs inside of an rcu critical section */
> +
> +       if (skb->pkt_type != PACKET_HOST)
> +               goto drop;
> +
> +       if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
> +               goto drop;
> +
> +       if (!pskb_may_pull(skb, sizeof(*hdr)))
> +               goto drop;
> +
> +       /* Read and decode the label */
> +       hdr = mpls_hdr(skb);
> +       dec = mpls_entry_decode(hdr);
> +
> +       /* Pop the label */
> +       skb_pull(skb, sizeof(*hdr));
> +       skb_reset_network_header(skb);
> +
> +       skb_orphan(skb);
> +
> +       rt = mpls_route_input_rcu(net, dec.label);
> +       if (!rt)
> +               goto drop;
> +
> +       /* Find the output device */
> +       out_dev = rt->rt_dev;
> +       if (!mpls_output_possible(out_dev))
> +               goto drop;
> +
> +       if (skb_warn_if_lro(skb))
> +               goto drop;
> +
> +       skb_forward_csum(skb);
> +
> +       /* Verify ttl is valid */
> +       if (dec.ttl <= 2)

Why is this "<= 2"?

> +               goto drop;
> +       dec.ttl -= 1;
> +
> +       /* Verify the destination can hold the packet */
> +       new_header_size = mpls_rt_header_size(rt);
> +       mtu = mpls_dev_mtu(out_dev);
> +       if (mpls_pkt_too_big(skb, mtu - new_header_size))
> +               goto drop;
> +
> +       hh_len = LL_RESERVED_SPACE(out_dev);
> +       if (!out_dev->header_ops)
> +               hh_len = 0;
> +
> +       /* Ensure there is enough space for the headers in the skb */
> +       if (skb_cow(skb, hh_len + new_header_size))
> +               goto drop;
> +
> +       skb->dev = out_dev;
> +       skb->protocol = htons(ETH_P_MPLS_UC);
> +
> +       if (unlikely(!new_header_size && dec.bos)) {
> +               /* Penultimate hop popping */
> +               if (!mpls_egress(rt, skb, dec))
> +                       goto drop;
> +       } else {
> +               bool bos;
> +               int i;
> +               skb_push(skb, new_header_size);
> +               skb_reset_network_header(skb);
> +               /* Push the new labels */
> +               hdr = mpls_hdr(skb);
> +               bos = dec.bos;
> +               for (i = rt->rt_labels - 1; i >= 0; i--) {
> +                       hdr[i] = mpls_entry_encode(rt->rt_label[i], dec.ttl, 0, bos);
> +                       bos = false;
> +               }
> +       }
> +
> +       err = neigh_xmit(rt->rt_via_family, out_dev, rt->rt_via, skb);
> +       if (err)
> +               net_dbg_ratelimited("%s: packet transmission failed: %d\n",
> +                                   __func__, err);
> +       return 0;
> +
> +drop:
> +       kfree_skb(skb);
> +       return NET_RX_DROP;
> +}
> +

Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ