[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87d21dolyt.fsf@x220.int.ebiederm.org>
Date: Tue, 02 Jun 2015 16:10:34 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Robert Shearman <rshearma@...cade.com>
Cc: <netdev@...r.kernel.org>, roopa <roopa@...ulusnetworks.com>,
Thomas Graf <tgraf@...g.ch>
Subject: Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap
Robert Shearman <rshearma@...cade.com> writes:
> On 02/06/15 19:11, Eric W. Biederman wrote:
>> Robert Shearman <rshearma@...cade.com> writes:
>>
>>> In order to be able to function as a Label Edge Router in an MPLS
>>> network, it is necessary to be able to take IP packets and impose an
>>> MPLS encap and forward them out. The traditional approach of setting
>>> up an interface for each "tunnel" endpoint doesn't scale for the
>>> common MPLS use-cases where each IP route tends to be assigned a
>>> different label as encap.
>>>
>>> The solution suggested here for further discussion is to provide the
>>> facility to define encap data on a per-nexthop basis using a new
>>> netlink attribue, RTA_ENCAP, which would be opaque to the IPv4/IPv6
>>> forwarding code, but interpreted by the virtual interface assigned to
>>> the nexthop.
>>>
>>> A new ipmpls interface type is defined to show the use of this
>>> facility to allow IP packets to be imposed with an MPLS
>>> encap. However, the facility is designed to be general enough to be
>>> used by any encapsulation/tunneling mechanism that has similar
>>> requirements of high-scale, high-variation-of-encap.
>>
>> I am still digging into the details but adding a new network device to
>> make this possible if very undesirable.
>>
>> It is a pain point. Those network devices get to be a major source of
>> memory consumption when there are 4K network namespaces in existence.
>>
>> It is conceptually wrong. The network device will never be used as an
>> ordinary network device. All the network device gives you is the
>> ability to avoid creating an enumeration of different kinds of
>> encapsulation.
>
> This isn't true. The network device also gives some of the things you
> take for granted. Things like fragmentation through specifying the mtu
> on the shared tunnel device, being able to specify rules using the
> shared tunnel output device, IP stats, and the ability specify a
> different destination namespace.
Granted you get a few more things. It is still conceptually wrong as
the network device will netver be used as an ordinary network device.
Fragmentation is already silly because we are talking about multiple
tunnels with different properties. You need per-route mtu to handle
that case.
Further I am not saying you don't need an output device (which is what
is needed to specify a different destination namespace) I am saying that
having a funny mpls device is wrong as far as I can see. Certainly it
is a lot of bloody unnecessary overhead.
If we are going to design for maximum scaling (and 1 million+ routes)
sounds like maximum scaling we should see how far we can go without
dragging in the horrible heaviness of additional network devices. 35K a
piece last I measured it. Just a small handful of them are already
scaling issues for network namespaces.
Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists