lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UdbXSsMSD_pLLgPSU5xOGk6Dc-0cAQNQ70LZxkcc6w4Mw@mail.gmail.com>
Date:   Tue, 14 Nov 2017 15:05:08 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Or Gerlitz <gerlitz.or@...il.com>
Cc:     David Miller <davem@...emloft.net>,
        Anjali Singhai Jain <anjali.singhai@...el.com>,
        Andy Gospodarek <gospo@...adcom.com>,
        Michael Chan <michael.chan@...adcom.com>,
        Simon Horman <simon.horman@...ronome.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        John Fastabend <john.fastabend@...il.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Jiri Pirko <jiri@...lanox.com>,
        Rony Efraim <ronye@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: SRIOV switchdev mode BoF minutes

On Tue, Nov 14, 2017 at 1:50 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
> On Tue, Nov 14, 2017 at 10:00 PM, Alexander Duyck
> <alexander.duyck@...il.com> wrote:
>> On Tue, Nov 14, 2017 at 8:44 AM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>> On Mon, Nov 13, 2017 at 7:10 PM, Alexander Duyck
>>> <alexander.duyck@...il.com> wrote:
>>>> On Sun, Nov 12, 2017 at 10:16 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>>>> On Sun, Nov 12, 2017 at 10:38 PM, Alexander Duyck
>>>
>>>>> The what we call slow path requirements are the following:
>>>>>
>>>>> 1. xmit on VF rep always turns to a receive on the VF, regardless of
>>>>> the offloaded SW steering rules ("send-to-vport")
>>>>>
>>>>> 2. xmit on VF which doesn't meet any offloaded SW steering rules must
>>>>> be received into the host OS from the VF rep
>>>
>>>>> 1,2 above must hold also for the uplink and the PF reps
>>>
>>>> I am well aware of the requirements. We discussed these with Jiri at
>>>> the previous netdev.
>>>
>>>>> When the i40e limitation was described to @ netdev, it seems you have a problem
>>>>> with VF xmit that should be turned to be a recv on the VF rep but also
>>>>> goes to the wire.
>>>
>>>>> It smells as if a FW patch can solve that, isn't that?
>>>
>>>> That is a huge maybe. We looked into it last time and while we can
>>>> meet requirements 1 and 2 we do so with a heavy performance penalty
>>>> due to the fact that we don't support anywhere near the same number of
>>>> flows as a true switch. Also while that might work for i40e
>>>
>>> to recap on i40e, you can support the slow path requirements, but  you have an
>>> issue with the fast path (== offloaded flows)? what is the issue there?
>>
>> We basically need to do some feasability research to see if we can
>> actually meet all the requirements for switchdev on i40e. We have been
>> getting mixed messages where we are given a great many "yes, but" type
>> answers. For i40e we are looking into it but I don't have high
>> confidence in our ability to actually support it in hardare/firmware.
>> If it were as easy as you have been led to believe, we would have done
>> it months ago when we were researching the requirements to support switchdev
>
> wait, Sridhar made seven rounds of his submission (this is the v7
> pointer [1]) and you
> still don't know if what you were attempting to push upstream can
> work, something is
> weird here, can you clarify? Jeff?

Not weird so much as stubborn. The patches were being pushed based on
the assumption that the community would accept a NIC generating port
representors that didn't necessarily pass traffic, and then even when
we had them passing traffic the PF still wasn't configured to handle
being the default destination for traffic without any rules
associated, instead VFs would directly send to the outside world.

> Sridhar, maybe you can explain if/what wrong assumptions you had in your code
> and what you think is the gap to address them and come up with proper
> impl for i40e?
>
> [1] https://marc.info/?l=linux-netdev&m=149083338400922&w=2

For starters the firmware change you are talking about didn't exist
during this time frame. We can ignore those patches as they assumed
that port representors didn't necessarily have to pass traffic.

>> In addition i40e isn't really my concern. I am much more
>> concerned about ixgbe as it has a much larger install base and many
>> more customers that are still buying it today.
>>
>>>> we still have a much larger install base of ixgbe ports that we have to support.
>>>
>>> ok, but support is one thing and keep enhancing a ten years old wrong
>>> SW model is 2nd thing
>>
>> The model might be 10 years old, but as I said we are still shipping
>> new silicon that was released just over a year ago that is supported
>> by the ixgbe driver.
>>
>> Also I don't know if the term "enhancing" is the right word for what I
>> am thinking. I'm not talking about adding new drivers that only
>> support legacy mode.  We are looking at probably having to refactor
>> the whole concept of "trusted" VF in order to break it out into
>> smaller buckets. In addition I plan to come up with a source mode
>> macvlan based "port representor" for legacy SR-IOV and hope to be able
>> to use that to start working on a better path for SR-IOV live
>> migration.
>>
>> Fundamentally the problem I have with us saying we cannot extend
>> legacy mode SR-IOV is that 82599 is a very large piece of the existing
>> install base for 10Gbit in general. We have it shipping on brand new
>> platforms as the silicon that is installed on the motherboard. With
>> that being the case people are going to want to get the most value
>> they can out of the silicon that they purchased since in many cases it
>> is just a standard part of the platform.
>
> Getting the most value still doesn't mean you should approach the community
> and ask to keep enhancing a wrong SW model for a switch.
>
> For example, suppose a single new bit module param to IXGBE will get
> you to sell another
> 100K or 1M or 10M pieces per year but we as community decided that
> module params are
> not the way to go - will you come and ask to add the module param for
> you to get more biz?

The problem is that is how things have been done in the past. I don't
want us going down that road. That is half of my frustration with how
things have been done. Even worse is how debugfs has been mis-used.
I'm trying to keep us from committing to an agreement that we won't
abide by.

>> I'm not saying we have new parts. I'm saying we have existing parts
>> that will likely need some work done. SwitchDev was only introduced
>> about 2 years ago. We have parts that were released around or before
>> then with functionality that didn't anticipate this. We still haven't
>> finished fully implementing all the features that were available on
>> the parts, that is what I am arguing. Usually new features go in for
>> several years after a part is released, usually something on the 3 to
>> 5 year range.
>
>> When SR-IOV was introduced there were two available modes, Virtual
>> Ethernet Port Aggregation, aka VEPA, and Virtual Ethernet Bridging,
>> aka VEB. The fact is SwitchDev is designed specifically for networking
>> SR-IOV with Virtual Ethernet Bridging, aka VEB. You argue that the
>> legacy model is bad, but I would argue that is because the legacy
>> model was really designed to work more for both VEPA than with VEB,
>> whereas SwitchDev only focuses on VEB. If you take a look in the ixgbe
>> or i40e drivers you will see that we support configuring both of those
>> modes via ndo_bridge_setlink since we have customer install bases that
>> actually prefer VEPA over VEB as they prefer to have their traffic
>> centrally managed instead of having the local host managing the
>> traffic. We cannot just arbitrarily tell our customers they are doing
>> SR-IOV using the "wrong model".
>>
>> I would rather not have SwitchDev become the next SystemD. The type
>> argument you are making is basically dictating to us and our customers
>> how things are supposed to work based on your view things. We have
>> different hardware, different customers, and all of our needs aren't
>> necessarily met by SwitchDev. I would agree that SwitchDev is the
>> go-to solution for VEB configuration, and we do plan to have future
>> hardware support it. In addition I would argue that for the sake of
>> consistency we should make sure that any feature that gets added to
>> the legacy has to be supported by the SwitchDev model as well before
>> it could be supported. If anything my hope is to evolve the legacy
>> model to have much of the same look and feel as SwitchDev, but that
>> will take time and require changes to the legacy model.
>>
>> I don't plan to have a ton of new features added to legacy SR-IOV, as
>> I stated earlier my main concern is the "trusted" VF mode as that has
>> become a security issue as everything is getting dumped into that so
>> we need to break it up to get finer granularity. For example I am
>> looking at adding a promisc/allmulti/multicast/broadcast control per
>> VF to set the upper limit of what a VF can request to receive instead
>> of just turning on "trusted" to allow a VF to turn on promiscuous. My
>> only other concern is live migration. I don't know if that will
>> require changes to the legacy SR-IOV mode or not, but it would be
>> better to not have that door closed as an option than to have to work
>> around it entirely.
>>
>> So, to summarize:
>> 1. VEPA is still a thing, that implies no e-switch. Switchdev does not
>> address that model.
>> 2. I agree that SwitchDev is the way forward for VEB.
>> 3. I agree we should focus on interface consistency so any new feature
>> added to legacy mode has to also be enabled in SwitchDev.
>>
>> I hope this makes my point a bit clearer. I don't fundamentally
>> disagree with the need to focus on having a consistent UAPI going
>> forward. The only spot where we have issues is that I don't see
>> SwitchDev as the only solution as we still have customers that aren't
>> necessarily making use of an eswitch and telling them they are "doing
>> it wrong" isn't really a viable solution. If nothing else I think we
>> can look at re-evaluating this at the next netdev/netconf, and for now
>> I would agree legacy SR-IOV changes should be under greater scrutiny.
>
> Alex,
>
> Lots of data and argumentation, it's too bad that none of it was
> said/presented @ the last
> netdev/netconf nor in the previous conferences (Feb 2016 / Oct 2016)
> when SRIOV switchdev
> was on the stage nor in the submissions that followed, doesn't seem as
> new data points, at
> least to you. As you said, the switchdev mode for SRIOV is around for
> two years (merged in 4.8
> but was presented way back). You waited two years to provide this
> input and we will have to wait
> another 6 months for you to conduct a session on that.

This is the first time where you have essentially said SwitchDev is
the only way things are going to be done going forward. In addition I
don't recall you ever using all the wording basically calling the
legacy model bad for SR-IOV. That is why I have been okay with it up
until now.

> Can you point out public use-cases / white-papers / design documents /
> blue prints / etc
> that employ the VEPA approach? b/c really no other person/vendor
> brought it up... we

Cisco and HP were the two vendors that were pushing it hard for a
while there. It isn't anywhere near as popular as VEB is, but from the
looks of it Cisco is still pushing a variant on it in the form of
vntag. If nothing else you can go look at the 802.1Qbg IEEE spec as it
is called out there as well.

> were all dealing with the sriov e-switch as a HW switch which should
> be programmed
> by the host stack according to well known industry models that apply
> on physical switches, e.g
>
> 1. L2 FDB (Linux Bridge)
> 2. L3 FIB (Linux Routers)
> 3. ACLS (Linux TC)
>
> [3] is what implemented by the upstream sriov switchdev drivers, [1] and [2] we
> discussed on netdev, maybe you want to play with [1] for i40e? I had a slide on
> that in the BoF
>
> Or.

So for i40e we will probably explore option 1, and possibly option 3
though as I said we still have to figure out what we can get the
firmware to actually do for us. That ends up being the ultimate
limitation.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ