netdev - Re: [PATCH][RFC] net/bridge: add basic VEPA support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200908101707.32751.arnd@arndb.de>
Date:	Mon, 10 Aug 2009 17:07:32 +0200
From:	Arnd Bergmann <arnd@...db.de>
To:	"Fischer, Anna" <anna.fischer@...com>
Cc:	"'Stephen Hemminger'" <shemminger@...ux-foundation.org>,
	"bridge@...ts.linux-foundation.org" 
	<bridge@...ts.linux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"virtualization@...ts.linux-foundation.org" 
	<virtualization@...ts.linux-foundation.org>,
	"evb@...oogroups.com" <evb@...oogroups.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"kaber@...sh.net" <kaber@...sh.net>,
	"adobriyan@...il.com" <adobriyan@...il.com>,
	"Paul Congdon (UC Davis)" <ptcongdon@...avis.edu>,
	Eric Biederman <ebiederm@...ssion.com>
Subject: Re: [PATCH][RFC] net/bridge: add basic VEPA support

On Monday 10 August 2009, Fischer, Anna wrote:
> > Subject: Re: [PATCH][RFC] net/bridge: add basic VEPA support
> > 
> > On Friday 07 August 2009, Paul Congdon (UC Davis) wrote:
> > > As I understand the macvlan code, it currently doesn't allow two VMs
> > on the
> > > same machine to communicate with one another.
> > 
> > There are patches to do that. I think if we add that, there should be
> > a way to choose the behavior between either bridging between the
> > guests or VEPA.
> 
> If you implement this direct bridging capability between local VMs for 
> macvlan, then would this not break existing applications that currently
> use it? It would be quite a significant change to how macvlan works 
> today. I guess, ideally, you would want to have macvlan work in 
> separate modes, e.g. traditional macvlan, bridging, and VEPA.

Right, that's what I meant with my sentence above. I'm not sure
if we need to differentiate traditional macvlan and VEPA though.
AFAICT, the only difference should be the handling of broadcast
and multicast frames returning from the hairpin turn. Since this
does not happen with a traditional macvlan, we can always send them
to all macvlan ports except the source port.

> > > I could imagine a hairpin mode on the adjacent bridge making this
> > > possible, but the macvlan code would need to be updated to filter
> > > reflected frames so a source did not receive his own packet.
> > 
> > Right, I missed this point so far. I'll follow up with a patch
> > to do that.
> 
> Can you maybe point me to the missing patches for macvlan that you
> have mentioned in other emails, and the one you mention above? 
> E.g. enabling multicast distribution and allowing local bridging etc.
> I could not find any of those in the archives. Thanks.

The patch from Eric Biederman to allow macvlan to bridge between
its slave ports is at

http://kerneltrap.org/mailarchive/linux-netdev/2009/3/9/5125774

I could not find any patches for the other features (or bugs).

> > This is the interesting part of the discussion. The bridge and macvlan
> > drivers certainly have an overlap in functionality and you can argue
> > that you only need one. Then again, the bridge code is a little crufty
> > and we might not want to add much more to it for functionality that can
> > be implemented in a much simpler way elsewhere. My preferred way would
> > be to use bridge when you really need 802.1d MAC learning, netfilter-
> > bridge
> > and STP, while we put the optimizations for stuff like VMDq, zero-copy
> > and multiqueue guest adapters only into the macvlan code.
> 
> I can see this being a possible solution. 
> 
> My concern with putting VEPA into macvlan instead of the bridging code
> is that there will be more work required to make it usable for other
> virtualization solution as macvtap will only work for KVM type setups.

Right, I understand.

> Basically, VEPA capabilities would rely on someone developing further
> drivers to connect macvlan to different backend interfaces, e.g. one for
> KVM (macvtap), one for Xen PV drivers, one for virtio, and whatever else
> is out there, or will be there in the future. The bridging code is
> already very generic in that respect, and all virtualization layers
> can deal with connecting interfaces to a bridge. 
> 
> Our extensions to the bridging code to enable VEPA for the Linux kernel
> are only very minimal code changes and would allow to make VEPA available
> to most virtualization solutions today.

I don't object to having VEPA supported in the bridge code at all.
I think your patch is simple enough so it won't hurt in the bridge
code. If Stephen prefers to do VEPA only in one component, we should
probably make it possible for that component to act as a bridge between
1+n existing interfaces as well. You can almost do that with the regular
macvlan and the bridge driver, like

      / macvlan0 - br0 - tap0
eth0 -- macvlan1 - br1 - tap1
      \ macvlan2 - br2 - tap2

Here, you can have two guests attached to tap devices (or xen net ...)
and the macvlan driver doing the VEPA. Of course this is not how bridge
works -- you would have the same mac addresses on two sides of
the bridge.

So we could have another macvlan backend (let's call it macvbridge)
so you can do this:

      / macvlan0 - 'qemu -net raw'
eth0 -- macvtap0 - 'qemu -net tap,fd=3 3<>/dev/net/macvtap0'
      \ macvbr0 -- tap0 - 'qemu -net tap'

The macvbr driver could this way be used to associate an existing
network device to a slave of a macvlan port. Not sure if this
has any significant advantage over your bridge patches, it does
have the obvious disadvantage that someone needs to implement
it first, while your patch is there ;-)

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html