netdev - Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <556E2B0B.4040100@brocade.com>
Date:	Tue, 2 Jun 2015 23:15:39 +0100
From:	Robert Shearman <rshearma@...cade.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	<netdev@...r.kernel.org>, roopa <roopa@...ulusnetworks.com>,
	Thomas Graf <tgraf@...g.ch>
Subject: Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap

On 02/06/15 22:10, Eric W. Biederman wrote:
> Robert Shearman <rshearma@...cade.com> writes:
>
>> On 02/06/15 19:11, Eric W. Biederman wrote:
>>> Robert Shearman <rshearma@...cade.com> writes:
>>>
>>>> In order to be able to function as a Label Edge Router in an MPLS
>>>> network, it is necessary to be able to take IP packets and impose an
>>>> MPLS encap and forward them out. The traditional approach of setting
>>>> up an interface for each "tunnel" endpoint doesn't scale for the
>>>> common MPLS use-cases where each IP route tends to be assigned a
>>>> different label as encap.
>>>>
>>>> The solution suggested here for further discussion is to provide the
>>>> facility to define encap data on a per-nexthop basis using a new
>>>> netlink attribue, RTA_ENCAP, which would be opaque to the IPv4/IPv6
>>>> forwarding code, but interpreted by the virtual interface assigned to
>>>> the nexthop.
>>>>
>>>> A new ipmpls interface type is defined to show the use of this
>>>> facility to allow IP packets to be imposed with an MPLS
>>>> encap. However, the facility is designed to be general enough to be
>>>> used by any encapsulation/tunneling mechanism that has similar
>>>> requirements of high-scale, high-variation-of-encap.
>>>
>>> I am still digging into the details but adding a new network device to
>>> make this possible if very undesirable.
>>>
>>> It is a pain point.  Those network devices get to be a major source of
>>> memory consumption when there are 4K network namespaces in existence.
>>>
>>> It is conceptually wrong.  The network device will never be used as an
>>> ordinary network device.  All the network device gives you is the
>>> ability to avoid creating an enumeration of different kinds of
>>> encapsulation.
>>
>> This isn't true. The network device also gives some of the things you
>> take for granted. Things like fragmentation through specifying the mtu
>> on the shared tunnel device, being able to specify rules using the
>> shared tunnel output device, IP stats, and the ability specify a
>> different destination namespace.
>
> Granted you get a few more things.  It is still conceptually wrong as
> the network device will netver be used as an ordinary network device.
>
> Fragmentation is already silly because we are talking about multiple
> tunnels with different properties.  You need per-route mtu to handle
> that case.

It's unlikely you'll have a huge variation in the mtus across routes, 
unless you're running in an ISP environment. In the example uses we've 
got in hand, it's highly likely they'll only be a handful of different 
mtus, if that.

> Further I am not saying you don't need an output device (which is what
> is needed to specify a different destination namespace) I am saying that
> having a funny mpls device is wrong as far as I can see.  Certainly it
> is a lot of bloody unnecessary overhead.
>
> If we are going to design for maximum scaling (and 1 million+ routes)
> sounds like maximum scaling we should see how far we can go without
> dragging in the horrible heaviness of additional network devices.  35K a
> piece last I measured it.  Just a small handful of them are already
> scaling issues for network namespaces.

For the ipmpls interface I've implemented here, you only need one per 
namespace. You could argue the same for the veth interfaces which would 
be much more commonly used in network namespaces.

BTW, maybe I've missed something, or maybe netdevs have gone on a diet, 
but I count the cost of creating a basic interface at ~2700 bytes on x86_64:
sizeof(struct net_device) /* 2112 */ + 1 * sizeof(struct netdev_queue) 
/* 384 */ + 1 * sizeof(struct netdev_rx_queue) /* 128 */ + sizeof(struct 
netdev_hw_addr) /* 80 */ + sizeof(int) * nr_poss_cpus /* 4 * n */)

Thanks,
Rob
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html