netdev - Re: [PATCH net-next 1/1] ipvlan: Initial check-in of the IPVLAN driver.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF2d9jgRFZzjtEkwVqo5Jw1rzbAS_9NC8LiOtO6xHgWqkVM2Zg@mail.gmail.com>
Date:	Wed, 12 Nov 2014 15:56:14 -0800
From:	Mahesh Bandewar <maheshb@...gle.com>
To:	Pavel Emelyanov <xemul@...allels.com>
Cc:	netdev <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	Maciej Zenczykowski <maze@...gle.com>,
	Laurent Chavey <chavey@...gle.com>,
	Tim Hockin <thockin@...gle.com>,
	David Miller <davem@...emloft.net>,
	Brandon Philips <brandon.philips@...eos.com>
Subject: Re: [PATCH net-next 1/1] ipvlan: Initial check-in of the IPVLAN driver.

On Wed, Nov 12, 2014 at 8:11 AM, Pavel Emelyanov <xemul@...allels.com> wrote:
> On 11/12/2014 02:29 AM, Mahesh Bandewar wrote:
>> This driver is very similar to the macvlan driver except that it
>> uses L3 on the frame to determine the logical interface while
>> functioning as packet dispatcher. It inherits L2 of the master
>> device hence the packets on wire will have the same L2 for all
>> the packets originating from all virtual devices off of the same
>> master device.
>>
>> This driver was developed keeping the namespace use-case in
>> mind. Hence most of the examples given here take that as the
>> base setup where main-device belongs to the default-ns and
>> virtual devices are assigned to the additional namespaces.
>>
>> The device operates in two different modes and the difference
>> in these two modes in primarily in the TX side.
>>
>> (a) L2 mode : In this mode, the device behaves as a L2 device.
>> TX processing upto L2 happens on the stack of the virtual device
>> associated with (namespace). Packets are switched after that
>> into the main device (default-ns) and queued for xmit.
>>
>> RX processing is simple and all multicast, broadcast (if
>> applicable), and unicast belonging to the address(es) are
>> delivered to the virtual devices.
>>
>> (b) L3 mode : In this mode, the device behaves like a L3 device.
>> TX processing upto L3 happens on the stack of the virtual device
>> associated with (namespace). Packets are switched to the
>> main-device (default-ns) for the L2 processing. Hence the routing
>> table of the default-ns will be used in this mode.
>>
>> RX processins is somewhat similar to the L2 mode except that in
>> this mode only Unicast packets are delivered to the virtual device
>> while main-dev will handle all other packets.
>>
>> The devices can be added using the "ip" command from the iproute2
>> package -
>>
>>       ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
>>
>> Signed-off-by: Mahesh Bandewar <maheshb@...gle.com>
>> Cc: Eric Dumazet <edumazet@...gle.com>
>> Cc: Maciej Żenczykowski <maze@...gle.com>
>> Cc: Laurent Chavey <chavey@...gle.com>
>> Cc: Tim Hockin <thockin@...gle.com>
>> Cc: Brandon Philips <brandon.philips@...eos.com>
>> Cc: Pavel Emelianov <xemul@...allels.com>
>
> Acked-by: /me on the general idea. We use this device of type in Parallels
> heavily for several reasons -- not to generate too many MAC-s from one host
> and to "enforce" the IP address for a container. I have a comment about the
> latter below.
>
>
>> +static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
>> +{
>> +     void *lyr3h = NULL;
>> +
>> +     switch (skb->protocol) {
>> +     case htons(ETH_P_ARP): {
>> +             struct arphdr *arph;
>> +
>> +             if (unlikely(!pskb_may_pull(skb, sizeof(struct arphdr))))
>> +                     return NULL;
>> +
>> +             arph = arp_hdr(skb);
>> +             *type = IPVL_ARP;
>> +             lyr3h = arph;
>> +             break;
>> +     }
>> +
>> +     case htons(ETH_P_IP): {
>> +             u32 pktlen;
>> +             struct iphdr *ip4h;
>> +
>> +             if (unlikely(!pskb_may_pull(skb, sizeof(struct iphdr))))
>> +                     return NULL;
>> +
>> +             ip4h = ip_hdr(skb);
>> +             pktlen = ntohs(ip4h->tot_len);
>> +             if (ip4h->ihl < 5 || ip4h->version != 4)
>> +                     return NULL;
>> +             if (skb->len < pktlen || pktlen < (ip4h->ihl * 4))
>> +                     return NULL;
>> +
>> +             *type = IPVL_IPV4;
>> +             lyr3h = ip4h;
>> +             break;
>> +     }
>> +     case htons(ETH_P_IPV6): {
>> +             struct ipv6hdr *ip6h;
>> +
>> +             if (unlikely(!pskb_may_pull(skb, sizeof(struct iphdr))))
>
> Misprint -- should be sizeof(struct ipv6hdr)
>
Good catch, will correct it!

>> +static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>> +                        struct nlattr *tb[], struct nlattr *data[])
>> +{
>> +     struct ipvl_dev *ipvlan = netdev_priv(dev);
>> +     struct ipvl_port *port;
>> +     struct net_device *phy_dev;
>> +     int err;
>> +
>> +     ipvlan_dbg(3, "%s[%d]: Entering...\n", __func__, __LINE__);
>> +     if (!tb[IFLA_LINK]) {
>> +             ipvlan_dbg(3, "%s[%d]: Returning -EINVAL...\n",
>> +                        __func__, __LINE__);
>> +             return -EINVAL;
>> +     }
>> +
>> +     phy_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
>> +     if (phy_dev == NULL) {
>> +             ipvlan_dbg(3, "%s[%d]: Returning -ENODEV...\n",
>> +                        __func__, __LINE__);
>> +             return -ENODEV;
>> +     }
>> +
>> +     /* TODO will someone try creating ipvlan-dev on an ipvlan-virtual dev?*/
>> +     if (!ipvlan_dev_master(phy_dev)) {
>> +             err = ipvlan_port_create(phy_dev);
>> +             if (err < 0) {
>> +                     ipvlan_dbg(3, "%s[%d]: Returning error (%d)...\n",
>> +                                __func__, __LINE__, err);
>> +                     return err;
>> +             }
>> +     }
>> +
>> +     port = ipvlan_port_get_rtnl(phy_dev);
>> +     /* Get the mode if specified. */
>> +     if (data && data[IFLA_IPVLAN_MODE])
>> +             port->mode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
>
> Should the invalid value be checked here? There are places
> where we BUG() in mode being "unknown".
>
Assuming the calls come over netlink, the ".validate" will be called
before ".newlink", so that would be unnecessary, isn't it?

>> +
>> +     ipvlan->phy_dev = phy_dev;
>> +     ipvlan->dev = dev;
>> +     ipvlan->port = port;
>> +     ipvlan->sfeatures = IPVLAN_FEATURES;
>> +     INIT_LIST_HEAD(&ipvlan->addrs);
>> +     ipvlan->ipv4cnt = 0;
>> +     ipvlan->ipv6cnt = 0;
>
>
>> +static int ipvlan_device_event(struct notifier_block *unused,
>> +                            unsigned long event, void *ptr)
>> +{
>> +     struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>> +     struct ipvl_dev *ipvlan, *next;
>> +     struct ipvl_port *port;
>> +     LIST_HEAD(lst_kill);
>> +
>> +     if (!ipvlan_dev_master(dev))
>> +             return NOTIFY_DONE;
>> +
>> +     port = ipvlan_port_get_rtnl(dev);
>> +
>> +     switch (event) {
>> +     case NETDEV_CHANGE:
>> +             list_for_each_entry(ipvlan, &port->ipvlans, pnode)
>> +                     netif_stacked_transfer_operstate(ipvlan->phy_dev,
>> +                                                      ipvlan->dev);
>> +             break;
>> +
>> +     case NETDEV_UNREGISTER:
>> +             if (dev->reg_state != NETREG_UNREGISTERING)
>> +                     break;
>> +
>> +             list_for_each_entry_safe(ipvlan, next, &port->ipvlans,
>> +                                      pnode)
>> +                     ipvlan->dev->rtnl_link_ops->dellink(ipvlan->dev,
>> +                                                         &lst_kill);
>> +             unregister_netdevice_many(&lst_kill);
>> +             list_del(&lst_kill);
>
> This list_del seems to be excessive.
>
That is correct. Looks like unregister_netdevice_many() does it now.
I'll remove it.

>> +             break;
>> +
>
>> +static int ipvlan_addr4_event(struct notifier_block *unused,
>> +                           unsigned long event, void *ptr)
>> +{
>> +     struct in_ifaddr *if4 = (struct in_ifaddr *)ptr;
>> +     struct net_device *dev = (struct net_device *)if4->ifa_dev->dev;
>> +     struct ipvl_dev *ipvlan = netdev_priv(dev);
>> +     struct in_addr ip4_addr;
>> +
>> +     ipvlan_dbg(3, "%s[%d]: Entering...\n", __func__, __LINE__);
>> +     if (!ipvlan_dev_slave(dev))
>> +             return NOTIFY_DONE;
>> +
>> +     if (!ipvlan || !ipvlan->port)
>> +             return NOTIFY_DONE;
>> +
>> +     switch (event) {
>> +     case NETDEV_UP:
>
> Can it be (in the future) somehow restricted so that net-namespace wouldn't
> be able to assign arbitrary IP address here? One of the reasons for using
> such devices is to enforce the container to use the IP address given from
> the host.
>
Probably this could be a config (sysfs?) entry which would lockup the
config coming from ns when set. So code could look like -
          case NETDEV_UP:
                         if (!restrict_ns_config) {
                            ...
                         }
                         break;

>> +             ip4_addr.s_addr = if4->ifa_address;
>> +             if (ipvlan_add_addr4(ipvlan, &ip4_addr))
>> +                     return NOTIFY_BAD;
>> +             break;
>> +
>> +     case NETDEV_DOWN:
>> +             ip4_addr.s_addr = if4->ifa_address;
>> +             ipvlan_del_addr4(ipvlan, &ip4_addr);
>> +             break;
>> +     }
>> +
>> +     ipvlan_dbg(3, "%s[%d]: Leaving...\n", __func__, __LINE__);
>> +     return NOTIFY_OK;
>> +}
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html