netdev - Re: [PATCH net-next 2/7] mpls: Basic routing support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 05 Mar 2015 12:42:17 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Vivek Venkatraman <vivek@...ulusnetworks.com>
Cc:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
	roopa <roopa@...ulusnetworks.com>,
	Stephen Hemminger <stephen@...workplumber.org>,
	santiago@...reenet.org, Simon Horman <horms@...ge.net.au>
Subject: Re: [PATCH net-next 2/7] mpls: Basic routing support

Vivek Venkatraman <vivek@...ulusnetworks.com> writes:

> On Tue, Mar 3, 2015 at 5:10 PM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> This change adds a new Kconfig option MPLS_ROUTING.
>>
>> The core of this change is the code to look at an mpls packet received
>> from another machine.  Look that packet up in a routing table and
>> forward the packet on.
>>
>> Support of MPLS over ATM is not considered or attempted here.  This
>> implemntation follows RFC3032 and implements the MPLS shim header that
>> can pass over essentially any network.
>>
>> What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
>> net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
>> Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
>> label fordwarding information base (lfib) might also be valid.
>>
>
> This currently does not allow for ECMP when acting as a transit,
> correct?

Correct.  There is no fundamental reason for that, ECMP just has not
been implemented yet.

>> Further the implemntation forwards packets as described in RFC3032.
>> There is no need and given the original motivation for MPLS a strong
>> discincentive to have a flexible label forwarding path.  In essence
>> the logic is the topmost label is read, looked up, removed, and
>> replaced by 0 or more new lables and the sent out the specified
>> interface to it's next hop.
>>
>> Quite a few optional features are not implemented here.  Among them
>> are generation of ICMP errors when the TTL is exceeded or the packet
>> is larger than the next hop MTU (those conditions are detected and the
>> packets are dropped instead of generating an icmp error).  The traffic
>> class field is always set to 0.  The implementation focuses on IP over
>> MPLS and does not handle egress of other kinds of protocols.
>>
>> Instead of implementing coordination with the neighbour table and
>> sorting out how to input next hops in a different address family (for
>> which there is value).  I was lazy and implemented a next hop mac
>> address instead.  The code is simpler and there are flavor of MPLS
>> such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
>> appropriate so a next hop by mac address would need to be implemented
>> at some point.
>>
>
> I guess the above is no longer the case with this revised patch which
> can support a IPv4 or IPv6 next hop too, right?

Correct.

>> Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.
>>
>> Decoding the mpls header must be done by first byeswapping a 32bit bit
>> endian word into the local cpu endian and then bit shifting to extract
>> the pieces.  There is no C bit-field that can represent a wire format
>> mpls header on a little endian machine as the low bits of the 20bit
>> label wind up in the wrong half of third byte.  Therefore internally
>> everything is deal with in cpu native byte order except when writing
>> to and reading from a packet.
>>
>> For management simplicity if a label is configured to forward out
>> an interface that is down the packet is dropped early.  Similarly
>> if an network interface is removed rt_dev is updated to NULL
>> (so no reference is preserved) and any packets for that label
>> are dropped.  Keeping the label entries in the kernel allows
>> the kernel label table to function as the definitive source
>> of which labels are allocated and which are not.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
>> ---
>>  include/linux/socket.h      |   2 +
>>  include/net/net_namespace.h |   4 +
>>  include/net/netns/mpls.h    |  15 ++
>>  net/mpls/Kconfig            |   5 +
>>  net/mpls/Makefile           |   1 +
>>  net/mpls/af_mpls.c          | 349 ++++++++++++++++++++++++++++++++++++++++++++
>>  net/mpls/internal.h         |  56 +++++++
>>  7 files changed, 432 insertions(+)
>>  create mode 100644 include/net/netns/mpls.h
>>  create mode 100644 net/mpls/af_mpls.c
>>  create mode 100644 net/mpls/internal.h
>>
>> <snip>
>> +
>> +static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>> +                       struct packet_type *pt, struct net_device *orig_dev)
>> +{
>> +       struct net *net = dev_net(dev);
>> +       struct mpls_shim_hdr *hdr;
>> +       struct mpls_route *rt;
>> +       struct mpls_entry_decoded dec;
>> +       struct net_device *out_dev;
>> +       unsigned int hh_len;
>> +       unsigned int new_header_size;
>> +       unsigned int mtu;
>> +       int err;
>> +
>> +       /* Careful this entire function runs inside of an rcu critical section */
>> +
>> +       if (skb->pkt_type != PACKET_HOST)
>> +               goto drop;
>> +
>> +       if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
>> +               goto drop;
>> +
>> +       if (!pskb_may_pull(skb, sizeof(*hdr)))
>> +               goto drop;
>> +
>> +       /* Read and decode the label */
>> +       hdr = mpls_hdr(skb);
>> +       dec = mpls_entry_decode(hdr);
>> +
>> +       /* Pop the label */
>> +       skb_pull(skb, sizeof(*hdr));
>> +       skb_reset_network_header(skb);
>> +
>> +       skb_orphan(skb);
>> +
>> +       rt = mpls_route_input_rcu(net, dec.label);
>> +       if (!rt)
>> +               goto drop;
>> +
>> +       /* Find the output device */
>> +       out_dev = rt->rt_dev;
>> +       if (!mpls_output_possible(out_dev))
>> +               goto drop;
>> +
>> +       if (skb_warn_if_lro(skb))
>> +               goto drop;
>> +
>> +       skb_forward_csum(skb);
>> +
>> +       /* Verify ttl is valid */
>> +       if (dec.ttl <= 2)
>
> Why is this "<= 2"?

It appears I rewrote that section one too many times it should be <= 1.

>> +               goto drop;
>> +       dec.ttl -= 1;
>> +
>> +       /* Verify the destination can hold the packet */
>> +       new_header_size = mpls_rt_header_size(rt);
>> +       mtu = mpls_dev_mtu(out_dev);
>> +       if (mpls_pkt_too_big(skb, mtu - new_header_size))
>> +               goto drop;
>> +
>> +       hh_len = LL_RESERVED_SPACE(out_dev);
>> +       if (!out_dev->header_ops)
>> +               hh_len = 0;
>> +
>> +       /* Ensure there is enough space for the headers in the skb */
>> +       if (skb_cow(skb, hh_len + new_header_size))
>> +               goto drop;
>> +
>> +       skb->dev = out_dev;
>> +       skb->protocol = htons(ETH_P_MPLS_UC);
>> +
>> +       if (unlikely(!new_header_size && dec.bos)) {
>> +               /* Penultimate hop popping */
>> +               if (!mpls_egress(rt, skb, dec))
>> +                       goto drop;
>> +       } else {
>> +               bool bos;
>> +               int i;
>> +               skb_push(skb, new_header_size);
>> +               skb_reset_network_header(skb);
>> +               /* Push the new labels */
>> +               hdr = mpls_hdr(skb);
>> +               bos = dec.bos;
>> +               for (i = rt->rt_labels - 1; i >= 0; i--) {
>> +                       hdr[i] = mpls_entry_encode(rt->rt_label[i], dec.ttl, 0, bos);
>> +                       bos = false;
>> +               }
>> +       }
>> +
>> +       err = neigh_xmit(rt->rt_via_family, out_dev, rt->rt_via, skb);
>> +       if (err)
>> +               net_dbg_ratelimited("%s: packet transmission failed: %d\n",
>> +                                   __func__, err);
>> +       return 0;
>> +
>> +drop:
>> +       kfree_skb(skb);
>> +       return NET_RX_DROP;
>> +}
>> +
>
> Vivek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html