[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF2d9jjJ4i7HMOMikp=C9W+70tL51YhpgBMadM2ZA-RYbRAzsw@mail.gmail.com>
Date: Wed, 20 Feb 2019 10:40:15 -0800
From: Mahesh Bandewar (महेश बंडेवार)
<maheshb@...gle.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: David Miller <davem@...emloft.net>, m@...bda.lt,
linux-netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net] ipvlan: disallow userns cap_net_admin to change
global mode/flags
On Tue, Feb 19, 2019 at 3:38 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> When running Docker with userns isolation e.g. --userns-remap="default"
> and spawning up some containers with CAP_NET_ADMIN under this realm, I
> noticed that link changes on ipvlan slave device inside that container
> can affect all devices from this ipvlan group which are in other net
> namespaces where the container should have no permission to make changes
> to, such as the init netns, for example.
>
> This effectively allows to undo ipvlan private mode and switch globally to
> bridge mode where slaves can communicate directly without going through
> hostns, or it allows to switch between global operation mode (l2/l3/l3s)
> for everyone bound to the given ipvlan master device. libnetwork plugin
> here is creating an ipvlan master and ipvlan slave in hostns and a slave
> each that is moved into the container's netns upon creation event.
>
> * In hostns:
>
> # ip -d a
> [...]
> 8: cilium_host@...d0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> inet 10.41.0.1/32 scope link cilium_host
> valid_lft forever preferred_lft forever
> [...]
>
> * Spawn container & change ipvlan mode setting inside of it:
>
> # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
> 9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
>
> # docker exec -ti client ip -d a
> [...]
> 10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
> valid_lft forever preferred_lft forever
>
> # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
>
> # docker exec -ti client ip -d a
> [...]
> 10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
> valid_lft forever preferred_lft forever
>
> * In hostns (mode switched to l2):
>
> # ip -d a
> [...]
> 8: cilium_host@...d0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> inet 10.41.0.1/32 scope link cilium_host
> valid_lft forever preferred_lft forever
> [...]
>
> Same l3 -> l2 switch would also happen by creating another slave inside
> the container's network namespace when specifying the existing cilium0
> link to derive the actual (bond0) master:
>
> # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
>
> # docker exec -ti client ip -d a
> [...]
> 2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
> link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> 10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
> valid_lft forever preferred_lft forever
>
> * In hostns:
>
> # ip -d a
> [...]
> 8: cilium_host@...d0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> inet 10.41.0.1/32 scope link cilium_host
> valid_lft forever preferred_lft forever
> [...]
>
> One way to mitigate it is to check CAP_NET_ADMIN permissions of
> the ipvlan master device's ns, and only then allow to change
> mode or flags for all devices bound to it. Above two cases are
> then disallowed after the patch.
thanks for the fix Daniel.
>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
Acked-by: Mahesh Bandewar <maheshb@...gle.com>
> ---
> drivers/net/ipvlan/ipvlan_main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 7cdac77..07e41c4 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -499,6 +499,8 @@ static int ipvlan_nl_changelink(struct net_device *dev,
>
> if (!data)
> return 0;
> + if (!ns_capable(dev_net(ipvlan->phy_dev)->user_ns, CAP_NET_ADMIN))
> + return -EPERM;
>
> if (data[IFLA_IPVLAN_MODE]) {
> u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
> @@ -601,6 +603,8 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> struct ipvl_dev *tmp = netdev_priv(phy_dev);
>
> phy_dev = tmp->phy_dev;
> + if (!ns_capable(dev_net(phy_dev)->user_ns, CAP_NET_ADMIN))
> + return -EPERM;
> } else if (!netif_is_ipvlan_port(phy_dev)) {
> /* Exit early if the underlying link is invalid or busy */
> if (phy_dev->type != ARPHRD_ETHER ||
> --
> 2.7.4
>
Powered by blists - more mailing lists