[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uc9SmH_W5j30Bux2UNXtPH=MR6qwrFbtkbPHTWirn8nmg@mail.gmail.com>
Date: Wed, 15 Nov 2017 10:25:10 -0800
From: Alexander Duyck <alexander.duyck@...il.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: Or Gerlitz <gerlitz.or@...il.com>,
David Miller <davem@...emloft.net>,
Anjali Singhai Jain <anjali.singhai@...el.com>,
Andy Gospodarek <gospo@...adcom.com>,
Michael Chan <michael.chan@...adcom.com>,
Simon Horman <simon.horman@...ronome.com>,
John Fastabend <john.fastabend@...il.com>,
Saeed Mahameed <saeedm@...lanox.com>,
Jiri Pirko <jiri@...lanox.com>,
Rony Efraim <ronye@...lanox.com>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: SRIOV switchdev mode BoF minutes
On Tue, Nov 14, 2017 at 8:02 PM, Jakub Kicinski
<jakub.kicinski@...ronome.com> wrote:
> On Tue, 14 Nov 2017 19:04:36 -0800, Alexander Duyck wrote:
>> On Tue, Nov 14, 2017 at 3:36 PM, Jakub Kicinski
>> <jakub.kicinski@...ronome.com> wrote:
>> > On Tue, 14 Nov 2017 15:05:08 -0800, Alexander Duyck wrote:
>> >> >> We basically need to do some feasability research to see if we can
>> >> >> actually meet all the requirements for switchdev on i40e. We have been
>> >> >> getting mixed messages where we are given a great many "yes, but" type
>> >> >> answers. For i40e we are looking into it but I don't have high
>> >> >> confidence in our ability to actually support it in hardare/firmware.
>> >> >> If it were as easy as you have been led to believe, we would have done
>> >> >> it months ago when we were researching the requirements to support switchdev
>> >> >
>> >> > wait, Sridhar made seven rounds of his submission (this is the v7
>> >> > pointer [1]) and you
>> >> > still don't know if what you were attempting to push upstream can
>> >> > work, something is
>> >> > weird here, can you clarify? Jeff?
>> >>
>> >> Not weird so much as stubborn. The patches were being pushed based on
>> >> the assumption that the community would accept a NIC generating port
>> >> representors that didn't necessarily pass traffic, and then even when
>> >> we had them passing traffic the PF still wasn't configured to handle
>> >> being the default destination for traffic without any rules
>> >> associated, instead VFs would directly send to the outside world.
>> >
>> > Perhaps the way forward is to lift the requirement on passing traffic,
>> > as long as the limitation is clearly expressed to the users.
>>
>> No, I am not arguing for that because then SwitchDev will fall into
>> disarray. If we want to have a strict definition for what is SwitchDev
>> and what isn't I am okay with that. It gives us a definition of what
>> our hardware needs to do in order to support it and without that we
>> are going to get hardware that just bends the rules to claim support
>> for it.
>
> Let me make sure we understand each other. The switchdev SR-IOV mode is
> what happens when user requests DEVLINK_ESWITCH_MODE_SWITCHDEV. Are you
> saying you are opposed to adding DEVLINK_ESWITCH_MODE_VEPA?
I wouldn't say I am opposed to that idea. We just need to clearly
define what MODE_VEPA is. I would say that even in MODE_VEPA we would
be passing traffic. The limitation though is that we wouldn't have the
same mechanisms in place to route the traffic.
The big issue with VEPA is that the traffic is routed to an external
entity before it makes a hairpin turn and comes back. As such we don't
have the actual origin of the packet to work with other than MAC and
VLAN. As far as directing a packet to a specific port the only way we
really have of doing that is to direct it to the MAC/VLAN pair for the
VF. This is one of the reasons why I am thinking source mode macvlan
is the solution to go with for something like this. Basically the
source mode macvlan can get pretty close to identifying the origin of
any packet that came from the VF assuming it is programmed with all
the MAC entries belonging to the VF. The only case where this doesn't
work is the "trusted" legacy mode VF that is running in promiscuous
with anti-spoof disabled.
>> All I am asking for is for us to not close the door to the possibility
>> of adding features to legacy SR-IOV. I am hoping to use a source
>> macvlan based approach to make it so that we can support "port
>> representors" for devices that can't support full SwitchDev. The idea
>> would be to use them to get as close to SwitchDev level support on
>> legacy devices as possible without using full SwitchDev. That should
>> solve a good part of the issue, but I am pretty certain I need to be
>> able to extend legacy SR-IOV in order to support it. I had talked with
>> Jiri at netdev 2.1 about it back when we had submitted the v7 patches,
>> and the decision was to look at doing "port representors" but don't
>> associate them with SwitchDev. I was out on Sabbatical for most of the
>> summer and I am just now starting on the macvlan work I had planned. I
>> hope to have it done before the next netdev and then we can discuss it
>> there if it needs more discussion than what we can have on the mailing
>> list.
>
> I don't know what you mean with the macvlan based approach. Could you
> perhaps describe it in more detail? Will it allow users to configure
> forwarding and queueing with existing, standard tools and APIs?
So there are a few issues with our devices doing SwitchDev mode that I
am trying to address.
One of the issues is that we have no direct way to figure out where
the packets are coming from as I described above. So instead of us
implementing multiple approaches for the same thing my thought was to
look at using source mode macvlan which does filtering on the source
MAC address instead of the destination. It shouldn't take much to
extend it so that a PF could notify a source mode macvlan interface of
all the unicast addresses a VF can use as a source address for
transmitting. With that we would at least be able to tell where the
traffic came from.
Another issue is directing transmit packets to the VF for any specific
interface. My thought is for our source mode based "port representor"
macvlan would be to limit the transmits so that we can only transmit
unicast packets that are guaranteed to be delivered to the proper
destination. Basically we would have to tag all broadcast and
multicast packets as being already forwarded and they would have to be
dropped on the "port representor" interfaces. Ideally there would be
some sort of uplink representor that would then be able to handle the
broadcast/multicast packets for the device since we end up replicating
the packets across all ports on the same VLAN currently.
The last issue is that by default all transmits that don't have a
matching filter in hardware are transmitted out the uplink port. That
was part of the issue that we don't think can be solved for ixgbe, and
even with a firmware change I am not certain how will i40e will work
for this. With macvlan being used as the model we basically skirt the
whole issue since that is kind of the standard behavior for macvlan
anyway.
In theory this all should work together to allow forwarding with the
existing tools. It would basically just mean we need to use FDB
programming on the port representor to control what MAC addresses are
handled for each interface. In addition we could probably handle the
ndo_setup_tc call in the port representors with some limited subset of
fields supported by flower to use that to route traffic.
It will be much easier to show all this once I have have code. It will
probably take me a month or so to dig out the technical debt that is
currently present for macvlan offload, and the fact that i40e
currently doesn't support it. Once I get those two items addressed my
plan is to then start tackling the source mode macvlan based port
representors. I hope to have an RFC ready early next year.
Thanks.
- Alex
Powered by blists - more mailing lists