netdev - macvlan devices and vlan interaction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 29 Jan 2018 23:01:40 +0000
From:   "Keller, Jacob E" <jacob.e.keller@...el.com>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     "Duyck, Alexander H" <alexander.h.duyck@...el.com>
Subject: macvlan devices and vlan interaction

Hi,

I'm currently investigating how macvlan devices behave in regards to vlan support, and found some interesting behavior that I am not sure how best to correct, or what the right path forward is.

If I create a macvlan device:

ip link add link ens0 name macvlan0 type macvlan:

and then add a VLAN to it:

ip link add link macvlan0 name vlan10 type vlan id 10

This works to pass VLAN 10 traffic over the macvlan device. This seems like expected behavior.

However, if I then also add vlan 10 to the lowerdev:

ip link add link ens0 name lowervlan10  type vlan id 10

Then traffic stops flowing to the VLAN on the macvlan device.

This happens, as far as I can tell, because of how the VLAN traffic is filtered first, and then forwarded to the VLAN device, which doesn't know about how the macvlan device exists.

It seems, essentially, that vlan stacked on top of a macvlan shouldn't work. Because the vlan code basically expects each vlan to apply to every MAC address, and the macvlan device works by putting its MAC address into the unicast address list, there's no way for a device driver to know when or how to apply the vlan.

This gets a bit more confusing when we add in the l2 fwd hardware offload.

Currently, at least for the Intel network parts, this isn't supported, because of a bug in which the device drivers don't apply the VLANs to the macvlan accelerated addresses. If we fix this, at least for fm10k, the behavior is slightly better, because of how the hardware filtering at the MAC address happens first, and we direct the traffic to the proper device regardless of VLAN.

In addition to this peculiarity of VLANs on both the macvlan and lowerdev, is that when a macvlan device adds a VLAN, the lowerdev gets an indication to add the vlan via its .ndo_vlan_rx_add_vid(), which doesn't distinguish between which addresses the VLAN might apply to. It thus simply, depending on hardware design, enables the VLAN for all its unicast and multicast addresses. Some hardware could theoretically support MAC+VLAN pairs, where it could distinguish that a VLAN should only be added for some subset of addresses. Other hardware might not be so lucky..

Unfortunately, this has the weird consequence that if we have the following stack of devices:

vlan10@...vlan0
macvlan0@...0
ens0

Then ens0 will receive VLAN10 traffic on every address. So VLAN 10 traffic destined to the MAC of the lowerdev will be received, instead of dropped.

If we add VLAN 10 to the lowerdev so we have both the above stack and also

lowervlan10@...0
ens0 (mac gg:hh:ii:jj:kk)

then all vlan 10 traffic will be received on the lowerdev VLAN 10, without any being forwarded to the VLAN10 attached to the macvlan.

However, if we add two macvlans, and each add the vlan10, so we have the following:

avlan10@...vlan0
macvlan0@...0
ens0

bvlan10@...vlan1
macvlan1@...0
ens0

In this case, it does appear that traffic is sorted out correctly. It seems that only if the lowerdev gets the VLAN does it end up breaking. If I remove bvlan10 from macvlan1, the traffic associated with vlan10 is still received by macvlan1, even though in principle it should no longer be.

What is the correct behavior here? Should this just be "administrators should know better"? I don't think that's a great argument, and either way we're still essentially leaking VLANs across the macvlan interfaces, which I don't think is ideal.

I see two possible solutions:

1) modify macvlan driver so that it is marked as VLAN_CHALLENGED, and thus indicate it cannot handle VLAN traffic on top of it.
  a. In order to get the VLANs associated, administrator could instead add the VLAN first, and then add the macvlan on top. This I think is a better configuration.
  b. that doesn't work in the offload case, unless/until we fix the VLAN interface to forward the l2_dfwd_add_station() along with a vid.
  c. this could appear as loss of functionality, since in some cases these VLAN on top of macvlan work today (with the interesting caveats listed above).

2) modify how VLANs interact with MAC addresses, so that the lowerdev can explicitly be aware of which VLANs are tied to which address groups, in order to allow for the explicit configuration of which MAC+VLAN pairs are actually allowed.
  a. this is a much more invasive change to driver interface, and more difficult to get right
  b. possibly other configurations of stacked devices might have a similar problem, so we could solve more here? Or create more problems.. I'm not really certain.


I think the correct solution is (1) but I wasn't sure what others thought, and whether anyone else has encountered the problems I mention and outline above. I cc'd Alex who I discussed with offline when I first heard of and began investigating this, in case he has anything further to add.

Regards,
Jake