netdev - Re: [PATCH net-next 1/1] ipvlan: Initial check-in of the IPVLAN driver.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 12 Nov 2014 20:11:27 +0400
From:	Pavel Emelyanov <xemul@...allels.com>
To:	Mahesh Bandewar <maheshb@...gle.com>,
	netdev <netdev@...r.kernel.org>
CC:	Eric Dumazet <edumazet@...gle.com>,
	Maciej Zenczykowski <maze@...gle.com>,
	Laurent Chavey <chavey@...gle.com>,
	Tim Hockin <thockin@...gle.com>,
	David Miller <davem@...emloft.net>,
	Brandon Philips <brandon.philips@...eos.com>
Subject: Re: [PATCH net-next 1/1] ipvlan: Initial check-in of the IPVLAN driver.

On 11/12/2014 02:29 AM, Mahesh Bandewar wrote:
> This driver is very similar to the macvlan driver except that it
> uses L3 on the frame to determine the logical interface while
> functioning as packet dispatcher. It inherits L2 of the master
> device hence the packets on wire will have the same L2 for all
> the packets originating from all virtual devices off of the same
> master device.
> 
> This driver was developed keeping the namespace use-case in
> mind. Hence most of the examples given here take that as the
> base setup where main-device belongs to the default-ns and
> virtual devices are assigned to the additional namespaces.
> 
> The device operates in two different modes and the difference
> in these two modes in primarily in the TX side.
> 
> (a) L2 mode : In this mode, the device behaves as a L2 device.
> TX processing upto L2 happens on the stack of the virtual device
> associated with (namespace). Packets are switched after that
> into the main device (default-ns) and queued for xmit.
> 
> RX processing is simple and all multicast, broadcast (if
> applicable), and unicast belonging to the address(es) are
> delivered to the virtual devices.
> 
> (b) L3 mode : In this mode, the device behaves like a L3 device.
> TX processing upto L3 happens on the stack of the virtual device
> associated with (namespace). Packets are switched to the
> main-device (default-ns) for the L2 processing. Hence the routing
> table of the default-ns will be used in this mode.
> 
> RX processins is somewhat similar to the L2 mode except that in
> this mode only Unicast packets are delivered to the virtual device
> while main-dev will handle all other packets.
> 
> The devices can be added using the "ip" command from the iproute2
> package -
> 
> 	ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
> 
> Signed-off-by: Mahesh Bandewar <maheshb@...gle.com>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Cc: Maciej Żenczykowski <maze@...gle.com>
> Cc: Laurent Chavey <chavey@...gle.com>
> Cc: Tim Hockin <thockin@...gle.com>
> Cc: Brandon Philips <brandon.philips@...eos.com>
> Cc: Pavel Emelianov <xemul@...allels.com>

Acked-by: /me on the general idea. We use this device of type in Parallels
heavily for several reasons -- not to generate too many MAC-s from one host
and to "enforce" the IP address for a container. I have a comment about the
latter below.


> +static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> +{
> +	void *lyr3h = NULL;
> +
> +	switch (skb->protocol) {
> +	case htons(ETH_P_ARP): {
> +		struct arphdr *arph;
> +
> +		if (unlikely(!pskb_may_pull(skb, sizeof(struct arphdr))))
> +			return NULL;
> +
> +		arph = arp_hdr(skb);
> +		*type = IPVL_ARP;
> +		lyr3h = arph;
> +		break;
> +	}
> +
> +	case htons(ETH_P_IP): {
> +		u32 pktlen;
> +		struct iphdr *ip4h;
> +
> +		if (unlikely(!pskb_may_pull(skb, sizeof(struct iphdr))))
> +			return NULL;
> +
> +		ip4h = ip_hdr(skb);
> +		pktlen = ntohs(ip4h->tot_len);
> +		if (ip4h->ihl < 5 || ip4h->version != 4)
> +			return NULL;
> +		if (skb->len < pktlen || pktlen < (ip4h->ihl * 4))
> +			return NULL;
> +
> +		*type = IPVL_IPV4;
> +		lyr3h = ip4h;
> +		break;
> +	}
> +	case htons(ETH_P_IPV6): {
> +		struct ipv6hdr *ip6h;
> +
> +		if (unlikely(!pskb_may_pull(skb, sizeof(struct iphdr))))

Misprint -- should be sizeof(struct ipv6hdr)

> +static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> +			   struct nlattr *tb[], struct nlattr *data[])
> +{
> +	struct ipvl_dev *ipvlan = netdev_priv(dev);
> +	struct ipvl_port *port;
> +	struct net_device *phy_dev;
> +	int err;
> +
> +	ipvlan_dbg(3, "%s[%d]: Entering...\n", __func__, __LINE__);
> +	if (!tb[IFLA_LINK]) {
> +		ipvlan_dbg(3, "%s[%d]: Returning -EINVAL...\n",
> +			   __func__, __LINE__);
> +		return -EINVAL;
> +	}
> +
> +	phy_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
> +	if (phy_dev == NULL) {
> +		ipvlan_dbg(3, "%s[%d]: Returning -ENODEV...\n",
> +			   __func__, __LINE__);
> +		return -ENODEV;
> +	}
> +
> +	/* TODO will someone try creating ipvlan-dev on an ipvlan-virtual dev?*/
> +	if (!ipvlan_dev_master(phy_dev)) {
> +		err = ipvlan_port_create(phy_dev);
> +		if (err < 0) {
> +			ipvlan_dbg(3, "%s[%d]: Returning error (%d)...\n",
> +				   __func__, __LINE__, err);
> +			return err;
> +		}
> +	}
> +
> +	port = ipvlan_port_get_rtnl(phy_dev);
> +	/* Get the mode if specified. */
> +	if (data && data[IFLA_IPVLAN_MODE])
> +		port->mode = nla_get_u16(data[IFLA_IPVLAN_MODE]);

Should the invalid value be checked here? There are places
where we BUG() in mode being "unknown".

> +
> +	ipvlan->phy_dev = phy_dev;
> +	ipvlan->dev = dev;
> +	ipvlan->port = port;
> +	ipvlan->sfeatures = IPVLAN_FEATURES;
> +	INIT_LIST_HEAD(&ipvlan->addrs);
> +	ipvlan->ipv4cnt = 0;
> +	ipvlan->ipv6cnt = 0;


> +static int ipvlan_device_event(struct notifier_block *unused,
> +			       unsigned long event, void *ptr)
> +{
> +	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct ipvl_dev *ipvlan, *next;
> +	struct ipvl_port *port;
> +	LIST_HEAD(lst_kill);
> +
> +	if (!ipvlan_dev_master(dev))
> +		return NOTIFY_DONE;
> +
> +	port = ipvlan_port_get_rtnl(dev);
> +
> +	switch (event) {
> +	case NETDEV_CHANGE:
> +		list_for_each_entry(ipvlan, &port->ipvlans, pnode)
> +			netif_stacked_transfer_operstate(ipvlan->phy_dev,
> +							 ipvlan->dev);
> +		break;
> +
> +	case NETDEV_UNREGISTER:
> +		if (dev->reg_state != NETREG_UNREGISTERING)
> +			break;
> +
> +		list_for_each_entry_safe(ipvlan, next, &port->ipvlans,
> +					 pnode)
> +			ipvlan->dev->rtnl_link_ops->dellink(ipvlan->dev,
> +							    &lst_kill);
> +		unregister_netdevice_many(&lst_kill);
> +		list_del(&lst_kill);

This list_del seems to be excessive.

> +		break;
> +

> +static int ipvlan_addr4_event(struct notifier_block *unused,
> +			      unsigned long event, void *ptr)
> +{
> +	struct in_ifaddr *if4 = (struct in_ifaddr *)ptr;
> +	struct net_device *dev = (struct net_device *)if4->ifa_dev->dev;
> +	struct ipvl_dev *ipvlan = netdev_priv(dev);
> +	struct in_addr ip4_addr;
> +
> +	ipvlan_dbg(3, "%s[%d]: Entering...\n", __func__, __LINE__);
> +	if (!ipvlan_dev_slave(dev))
> +		return NOTIFY_DONE;
> +
> +	if (!ipvlan || !ipvlan->port)
> +		return NOTIFY_DONE;
> +
> +	switch (event) {
> +	case NETDEV_UP:

Can it be (in the future) somehow restricted so that net-namespace wouldn't
be able to assign arbitrary IP address here? One of the reasons for using
such devices is to enforce the container to use the IP address given from
the host.

> +		ip4_addr.s_addr = if4->ifa_address;
> +		if (ipvlan_add_addr4(ipvlan, &ip4_addr))
> +			return NOTIFY_BAD;
> +		break;
> +
> +	case NETDEV_DOWN:
> +		ip4_addr.s_addr = if4->ifa_address;
> +		ipvlan_del_addr4(ipvlan, &ip4_addr);
> +		break;
> +	}
> +
> +	ipvlan_dbg(3, "%s[%d]: Leaving...\n", __func__, __LINE__);
> +	return NOTIFY_OK;
> +}

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html