netdev - Re: [RFC PATCH 00/29] net: VRF support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150210005344.GA6293@casper.infradead.org>
Date:	Tue, 10 Feb 2015 00:53:44 +0000
From:	Thomas Graf <tgraf@...g.ch>
To:	David Ahern <dsahern@...il.com>
Cc:	netdev@...r.kernel.org, ebiederm@...ssion.com
Subject: Re: [RFC PATCH 00/29] net: VRF support

On 02/04/15 at 06:34pm, David Ahern wrote:
> Namespaces provide excellent separation of the networking stack from the
> netdevices and up. The intent of VRFs is to provide an additional,
> logical separation at the L3 layer within a namespace.

What you ask for seems to be L3 micro segmentation inside netns. I
would argue that we already support this through multiple routing
tables. I would prefer improving the existing architecture to cover
your use cases: Increase the number of supported tables, extend
routing rules as needed, ...

> The VRF id of tasks defaults to 1 and is inherited parent to child. It can
> be read via the file '/proc/<pid>/vrf' and can be changed anytime by writing
> to this file (if preferred this can be made a prctl to change the VRF id).
> This allows services to be launched in a VRF context using ip, similar to
> what is done for network namespaces.
>     e.g., ip vrf exec 99 /usr/sbin/sshd

I think such as classification should occur through cgroups instead
of touching PIDs directly.

> Network devices belong to a single VRF context which defaults to VRF 1.
> They can be assigned to another VRF using IFLA_VRF attribute in link
> messages. Similarly the VRF assignment is returned in the IFLA_VRF
> attribute. The ip command has been modified to display the VRF id of a
> device. L2 applications like lldp are not VRF aware and still work through
> through all network devices within the namespace.

I believe that binding net_devices to VRFs is misleading and the
concept by itself is non-scalable. You do not want to create 10k
net_devices for your overlay of choice just to tie them to a
particular VRF. You want to store the VRF identifier as metadata and
have a stateless classifier included it in the VRF decision. See the
recent VXLAN-GBP work.

You could either map whatever selects the VRF to the mark or support it
natively in the routing rules classifier.

An obvious alternative is OVS. What you describe can be implemented in
a scalable matter using OVS and mark. I understand that OVS is not for
everybody but it gets a fundamental principle right: Scalability
demands for programmability.

I don’t think we should be adding a new single purpose metadata field
to arbitrary structures for every new use case that comes up. We
should work on programmability which increases flexibility and allows
decoupling application interest from networking details.

> On RX skbs get their VRF context from the netdevice the packet is received
> on. For TX the VRF context for an skb is taken from the socket. The
> intention is for L3/raw sockets to be able to set the VRF context for a
> packet TX using cmsg (not coded in this patch set).

Specyfing L3 context in cmsg seems very broken to me. We do not want
to bind applications any closer to underlying networking infrastructure.
In fact, we should do the opposite and decouple this completely.

> The 'any' context applies to listen sockets only; connected sockets are in
> a VRF context. Child sockets accepted by the daemon acquire the VRF context
> of the network device the connection originated on.

Linux considers an address local regardless of the interface the packet
was received on.  So you would accept the packet on any interface and
then bind it to the VRF of that interface even though the route for it
might be on a different interface.

This really belongs into routing rules from my perspective which takes
mark and the cgroup context into account.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html