[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43F901BD926A4E43B106BF17856F075501A20AB5F2@orsmsx508.amr.corp.intel.com>
Date: Wed, 30 Nov 2011 15:19:12 -0800
From: "Rose, Gregory V" <gregory.v.rose@...el.com>
To: Chris Wright <chrisw@...hat.com>,
Ben Hutchings <bhutchings@...arflare.com>
CC: Roopa Prabhu <roprabhu@...co.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"sri@...ibm.com" <sri@...ibm.com>,
"dragos.tatulea@...il.com" <dragos.tatulea@...il.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"arnd@...db.de" <arnd@...db.de>, "mst@...hat.com" <mst@...hat.com>,
"mchan@...adcom.com" <mchan@...adcom.com>,
"dwang2@...co.com" <dwang2@...co.com>,
"shemminger@...tta.com" <shemminger@...tta.com>,
"eric.dumazet@...il.com" <eric.dumazet@...il.com>,
"kaber@...sh.net" <kaber@...sh.net>,
"benve@...co.com" <benve@...co.com>
Subject: RE: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering
support for passthru mode
> -----Original Message-----
> From: Chris Wright [mailto:chrisw@...hat.com]
> Sent: Wednesday, November 30, 2011 3:01 PM
> To: Ben Hutchings
> Cc: Chris Wright; Rose, Gregory V; Roopa Prabhu; netdev@...r.kernel.org;
> davem@...emloft.net; sri@...ibm.com; dragos.tatulea@...il.com;
> kvm@...r.kernel.org; arnd@...db.de; mst@...hat.com; mchan@...adcom.com;
> dwang2@...co.com; shemminger@...tta.com; eric.dumazet@...il.com;
> kaber@...sh.net; benve@...co.com
> Subject: Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering
> support for passthru mode
>
> * Ben Hutchings (bhutchings@...arflare.com) wrote:
> > On Wed, 2011-11-30 at 13:04 -0800, Chris Wright wrote:
> > > I agree that it's confusing. Couldn't you simplify your ascii art
> > > (hopefully removing hw assumptions about receive processing, and
> > > completely ignoring vlans for the moment) to something like:
> > >
> > > |RX
> > > v
> > > +------------+-------------+
> > > | +------+--------+ |
> > > | | RX MAC filter | |
> > > | |and port select| |
> > > | +---------------+ |
> > > | /|\ |
> > > | / | \ match 2|
> > > | / v \ |
> > > | /match \ |
> > > | / 1 | \ |
> > > | / | \ |
> > > |match / | \ |
> > > | 0 / | \ |
> > > | v | v |
> > > | | | | |
> > > +----+--------+--------+---+
> > > | | |
> > > PF VF 1 VF 2
> > >
> > > And there's an unclear number of ways to update "RX MAC filter and
> port
> > > select" table.
> > >
> > > 1) PF ndo_set_mac_addr
> > > I expect that to be implicit to match 0.
> > >
> > > 2) PF ndo_set_rx_mode
> > > Less clear, but I'd still expect these to implicitly match 0
> > >
> > > 3) PF ndo_set_vf_mac
> > > I expect these to be an explicit match to VF N (given the interface
> > > specifices which VF's MAC is being programmed).
> >
> > I'm not sure whether this is supposed to implicitly add to the MAC
> > filter or whether that has to be changed too. That's the main
> > difference between my models (a) and (b).
>
> I see now. I wasn't entirely clear on the difference before. It's also
> going to be hw specific. I think (Intel folks can verify) that the
> Intel SR-IOV devices have a single global unicast exact match table,
> for example.
>
> > There's also PF ndo_set_vf_vlan.
>
> Right, although I had mentioned I was trying to limit just to MAC
> filtering to simplify.
>
> > > 4) VF ndo_set_mac_addr
> > > This one may or may not be allowed (setting MAC+port if the VF is
> owned
> > > by a guest is likely not allowed), but would expect an implicit VF N.
> > >
> > > 5) VF ndo_set_rx_mode
> > > Same as 4) above.
> >
> > So this is where we are today.
>
> Cool, good that we agree there.
>
> > > 6) PF or VF? ndo_set_rx_filter_addr
> > > The new proposal, which has an explicit VF, although when it's VF_SELF
> > > I'm not clear if this is just the same as 5) above?
> > >
> > > Have I missed anything?
> >
> > Any physical port can be bridged to a mixture of guests with and without
> > their own VFs. Packets sent from a guest with a VF to the address of a
> > guest without a VF need to be forwarded to the PF rather than the
> > physical port, but none of the drivers currently get to know about those
> > addresses.
>
> To clarify, do you mean something like this?
>
> physical port
> |
> +------------+------------+
> | +-----+ |
> | | VEB | |
> | +-----+ |
> | / | \ |
> | / | \ |
> | / | \ |
> +-----+------+------+-----+
> | | |
> PF VF 1 VF 2
> / | |
> +---+---+ VM4 +---+---+
> | sw | |macvtap|
> | switch| +---+---+
> +-+-+-+-+ |
> / | \ VM5
> / | \
> VM1 VM2 VM3
>
> This has VMs 1-3 hanging of the PF via a linux bridge (traditional hv
> switching), VM4 directly owning VF1 (pci device assignement), and VM5
> indirectly owning VF2 (macvtap passthrough, that started this whole
> thing).
>
> So, I'm understanding you saying that VM4 or VM4 sending a packet to VM1
> goes in to VEB, out PF, and into linux bridging code, rigth? At which
> point the PF is in promiscuous mode (btw, same does not work if bridge is
> attached to VF, at least for some VFs, due to lack of promiscuous mode).
>
> > Packets sent from a guest with a VF to the address of another guest with
> > a VF need to be forwarded similarly, but the driver should be able to
> > infer that from (3).
>
> Right, and that works currently for the case where both guests are like
> VM4, they directly own the VF via PCI device assignement. But for VM4
> to talk to VM5, VF3 is not in promiscuous mode and has a different MAC
> address than VM5's vNIC. If the embedded bridge does not learn, and
> nobody programmed it to fwd frames for VM5 via VF3...
>
> I believe this is what Roopa's patch will allow. The question now is
> whether there's a better way to handle this?
>
> In my mind, we'd model the NIC's embedded bridge as, well, a bridge.
> And set anti-spoofing, port mirroring, port mac/vlan filtering, etc via
> that bridge.
If there was some way to push the bridge forwarding database down to the
underlying HW so that the filters could be programmed into the HW for
non-learning VEBs that would work too.
This hole has existed for a very long time, years now. It'd be nice to get
it fixed. If the community direction is to extend the current bridging
interface then that's fine, we'll go that way.
- Greg
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists