lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3A5015FE9E557D448AF7238AF0ACE20A2D8ADC24@IRVEXCHMB11.corp.ad.broadcom.com>
Date:	Wed, 4 Mar 2015 21:51:10 +0000
From:	David Christensen <davidch@...adcom.com>
To:	John Fastabend <john.fastabend@...il.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Jiří Pírko (jiri@...nulli.us)" 
	<jiri@...nulli.us>
Subject: RE: Switchdev Application to SR-IOV NICs

> >       +-----+-----+-----+
> >       | vm0 | vm1 | vm2 | Virtual
> >       | eth0| eth0| eth0| Machines
> > +-----+--|--+--|--+--|--+----------
> > |sw0p0 sw0p1 sw0p2 sw0p3| Kernel
> > +--|-----|-----|-----|--+----------
> > | pf0   vf0   vf1   vf2 | PCIe
> > +--|-----|-----|-----|--+----------
> > | ++-----+-----+-----++ | SR-IOV NIC
> > | | VEB               | |
> > | +------------+------+ |
> > | SR-IOV NIC   |        |
> > +--------------|--------+
> >                 |
> >                PHY

I did wonder if a KVM virtualization use case is the only one to 
consider. What about a container model?  This could be very
confusing to users with all of the "phantom" sw0p* devices visible
alongside the eth* devices.

> 
> That looks good to me I might add one more netdev to represent the
> egress port though. This could be used to submit control traffic
> that should not by spec be sent through a VEB. For example STP,
> LLDP, etc. At the moment we send this traffic on sw0p0 which is
> exactly correct.
>

How do we separate hypervisor/eth0 from hypervisor/sw0p0 traffic
in this case?  Lots of corner cases to consider if we take that
path.

> I had some prototype code @ one point that did this I can dig it
> up if folks think its useful.
> 
> Also it might be worth noting the "Kernel" net_devices are not
> actually bound to the virtual function but multiplexed/demux'd
> over the physical function pf0 in the diagram. The diagram might
> be read to imply some PCIe relationship between sw0p3 and vf2.
>
> >    +-----+
> >    | vm0 |
> >    | eth0|
> >    +--|--+
> >    |sw0p1|
> >    +--|--+
> >    | vf0 |
> > +----|----+
> > | +--+--+ |
> > | | VEB | |
> > | +-----+ |
> > +---------+
> >
> > It's unclear to me when traffic egressing the VEB should terminate at
> sw0p1 vs.
> > vm0's eth0.  They both represent the same MAC/VLAN.  Similarly, for
> traffic
> > egressing vm0's eth0, when should it terminate at sw0p1 vs. the VEB.
> >
> > Can anyone offer an alternate diagram for switchdev on an SR-IOV NIC?
> >
> 
> One approach would be to treat it like the switch case where instead
> of a physical port you have a VF. In this case if you xmit a packet on
> sw0p1 it is sent to eth0. Then if vm0 (eth0) xmits a packet it enters
> the VEB. The only way to get packets onto sw0p1 is to use a rule to
> either "trap" or "mirror" packets to the "CPU sw0p1 port". Maybe a
> better name would be "hypervisor sw0p1 port". This would be analagous
> to the switch case, I have experimented with adding this support to
> the Flow API I'm working on but have not implemented it on rocker yet.

(Drawing slightly modified to match text above.)

>   +-----+      +-----+
>   |hyper|      | vm0 |
>   |visor|      | eth0|
>   +-----+      +-----+
>      |            |
>   +--|--+      +--|--+
>   |sw0p0|      |sw0p1|
>   +-----+      +-----+
>      |           |
>   +--|-----|-----|-----|--+
>   | ++-----+-----+-----++ |
>   | | VEB               | |
>   | +------------+------+ |
>   | SR-IOV NIC   |        |
>   +--------------|--------+
>                   |
>                  PHY
> 

This was my thought as well but there's no real hardware connection
sw0p1 and vm0/eth0 so I don't see how to forward frames across that
conceptual sw0p1<->eth0 interface.  It seems like a different 
hardware interface is required:

+-----+ +-----+
|hyper| | vm0 |
|visor| | eth0|
+--|--+ +--|--+
|sw0p1|    |
+--|--+ +--|--+
| vf0'| | vf0 |
+--|--+-+--|--+
| ++-------++ |
| |   VEB   | |
| +---------+ |
+-------------+

When VEB is handling L2 Forwarding, packets entering the VEB with
vm0/eth0's MAC/VLAN would be forwarded to the vf0 interface and land
in the VM, even packets transmitted on sw0p1.  

Things become more interesting if the VEB were to implement an Open
Flow forwarding model.  In that case, a packet that enters the VEB
with vm0/eth0's MAC/VLAN would be forwarded to the vf0 interface on
a flow hit and forwarded to the vf0' interface on a flow miss.

Perhaps another way to draw it, more in line with your comment about
pf0 above, would look like this:

+-------------+ +-----+
|hyper        | | vm0 |
|visor        | | eth0|
+--|--+-+--|--+ +--|--+
| eth0| |sw0p1|    |
+--|--+ +--|--+ +--|--+
| pf0 |----+    | vf0 |
+--|--+---------+--|--+
| ++---------------++ |
| |       VEB       | |
| +-----------------+ |
+---------------------+


> here the link between sw0p2 and vm1 is a virtual function instead of a
> physical wire. And sw0p0 is the "CPU port" directly to the hypervisor.
> 
> Is that at all clear? Let me know I can try to do a better write up
> in the AM.

I think we need to decide what the relationship is between the VEB and 
the host bridging functions before we can settle on a topology.

1) Treat the VEB and host bridge/OVS as two separate switches and connect
   them through an uplink (pf0/sw0p0?).

Advantages: Fewer "phantom" devices in the design; works with more existing
            devices.
Disadvantages: Lost metadata such as VEB ingress port.

2) Treat the VEB and host bridge/OVS as a stacked switch.

Advantages: Simplified presentation to the kernel.
Disadvantages: More complex NIC/VEB design; definition of a stacking
               wire protocol to pass metadata.

3) Other options?

Dave

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ