[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191113182546.4db77a51@cakuba>
Date: Wed, 13 Nov 2019 18:25:46 -0800
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: "Keller, Jacob E" <jacob.e.keller@...el.com>
Cc: "saeedm@...lanox.com" <saeedm@...lanox.com>,
"sbrivio@...hat.com" <sbrivio@...hat.com>,
"nikolay@...ulusnetworks.com" <nikolay@...ulusnetworks.com>,
"dsahern@...il.com" <dsahern@...il.com>,
"sd@...asysnail.net" <sd@...asysnail.net>,
"jiri@...lanox.com" <jiri@...lanox.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"lariel@...lanox.com" <lariel@...lanox.com>
Subject: Re: [PATCH net-next v2 0/3] VGT+ support
On Wed, 13 Nov 2019 22:55:16 +0000, Keller, Jacob E wrote:
> Hi Jakub,
>
> On Tue, 2019-11-05 at 17:38 -0800, Jakub Kicinski wrote:
> > In the upstream community, however, we care about the technical
> > aspects.
> >
> > > and we all know that it could take years before we can sit back and
> > > relax that we got our L2 switching ..
> >
> > Let's not be dramatic. It shouldn't take years to implement basic L2
> > switching offload.
>
> I had meant to send something earlier in this thread, but never got
> around to it. I wanted to ask your opinion and get some feedback.
>
> We (Intel) have recently been investigating use of port representors
> for enabling introspection and control of VFs in the host system after
> they've been assigned to a virtual machine.
Cool!
> I had originally been thinking of adding these port representor netdevs
> before we fully implement switchdev with the e-switch offloads. The
> idea was to migrate to using port representors in either case.
>
> However, from what it looks like on this thread, you'd rather propose
> that we implement switchdev with basic L2 offload?
>
> I'm not too familiar with switchdev, (trying to read and learn about so
> that we can begin actually implementing it in our network drivers).
So switchdev mode for SR-IOV NICs basically means that all ports are
represented by a netdev and no implicit switching happens in HW, if
packet is received on a port, be it VF or uplink - it's sent up to the
representor. That's pretty much it. Then you can install rules to
forward internally in the device.
Perhaps an obvious suggestion but did you consider converting
Documentation/networking/switchdev.txt to ReST and updating it as you
explore the code? ;)
> Based on your comments and feedback in this thread, it sounds like our
> original idea to have a "legacy with port representors" mode is not
> really something you'd like, because it would continue to encourage
> avoiding migrating from legacy stack to switchdev model.
Not at this point, no.
I think this was all discussed before with Alex still strongly in the
netdev loops at Intel. I was initially accommodating to some partial
implementations like what you mention, since it'd had been good to have
Intel come on board with switchdev, and Fortville reportedly couldn't
disable leaking the packets which missed filters to the uplink.
IIRC at that point Or Gerlitz strongly drew the line at preserving
switchdev behaviour as described above - default to everything falls
back to host.
Today since many months have passed I don't think we should walk back
on that decision.
> But, instead of trying to go fully towards implementing switchdev with
> complicated OvS offloads, we could do a simpler approach that only
> supports L2 offloads initially, and from these comments it seems this
> is the direction you'd rather upstream persue?
Yes, I think simple L2 offload that supports common cases would be a
pretty cool starting point for a new switchdev implementation.
> > I had given switchdev L2 some thought. IDK what you'd call serious,
> > I don't have code. We are doing some ridiculously complex OvS
> > offloads
> > while most customers just need L2 plus maybe VXLAN encap and basic
> > ACLs. Which switchdev can do very nicely thanks to Cumulus folks.
>
> Based on this, it sounds like the switchdev API can do this L2
> offloading and drivers simply need to enable it. If I understand
> correctly, it requires the system administrator to place the VF devies
> into a bridge, rather than simply having the bridging hidden inside the
> device.
Yup. You may want to support only offloading of certain configuration
of the bridge to simplify your life, e.g. disable learning and flood
only to uplink..
Powered by blists - more mailing lists