lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ3xEMg17nLCR4dgLev8vLxTwj2p27HUWZ2C_Vsbqes=eotKZw@mail.gmail.com>
Date:   Tue, 14 Nov 2017 23:50:32 +0200
From:   Or Gerlitz <gerlitz.or@...il.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     David Miller <davem@...emloft.net>,
        Anjali Singhai Jain <anjali.singhai@...el.com>,
        Andy Gospodarek <gospo@...adcom.com>,
        Michael Chan <michael.chan@...adcom.com>,
        Simon Horman <simon.horman@...ronome.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        John Fastabend <john.fastabend@...il.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Jiri Pirko <jiri@...lanox.com>,
        Rony Efraim <ronye@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: SRIOV switchdev mode BoF minutes

On Tue, Nov 14, 2017 at 10:00 PM, Alexander Duyck
<alexander.duyck@...il.com> wrote:
> On Tue, Nov 14, 2017 at 8:44 AM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>> On Mon, Nov 13, 2017 at 7:10 PM, Alexander Duyck
>> <alexander.duyck@...il.com> wrote:
>>> On Sun, Nov 12, 2017 at 10:16 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>>> On Sun, Nov 12, 2017 at 10:38 PM, Alexander Duyck
>>
>>>> The what we call slow path requirements are the following:
>>>>
>>>> 1. xmit on VF rep always turns to a receive on the VF, regardless of
>>>> the offloaded SW steering rules ("send-to-vport")
>>>>
>>>> 2. xmit on VF which doesn't meet any offloaded SW steering rules must
>>>> be received into the host OS from the VF rep
>>
>>>> 1,2 above must hold also for the uplink and the PF reps
>>
>>> I am well aware of the requirements. We discussed these with Jiri at
>>> the previous netdev.
>>
>>>> When the i40e limitation was described to @ netdev, it seems you have a problem
>>>> with VF xmit that should be turned to be a recv on the VF rep but also
>>>> goes to the wire.
>>
>>>> It smells as if a FW patch can solve that, isn't that?
>>
>>> That is a huge maybe. We looked into it last time and while we can
>>> meet requirements 1 and 2 we do so with a heavy performance penalty
>>> due to the fact that we don't support anywhere near the same number of
>>> flows as a true switch. Also while that might work for i40e
>>
>> to recap on i40e, you can support the slow path requirements, but  you have an
>> issue with the fast path (== offloaded flows)? what is the issue there?
>
> We basically need to do some feasability research to see if we can
> actually meet all the requirements for switchdev on i40e. We have been
> getting mixed messages where we are given a great many "yes, but" type
> answers. For i40e we are looking into it but I don't have high
> confidence in our ability to actually support it in hardare/firmware.
> If it were as easy as you have been led to believe, we would have done
> it months ago when we were researching the requirements to support switchdev

wait, Sridhar made seven rounds of his submission (this is the v7
pointer [1]) and you
still don't know if what you were attempting to push upstream can
work, something is
weird here, can you clarify? Jeff?

Sridhar, maybe you can explain if/what wrong assumptions you had in your code
and what you think is the gap to address them and come up with proper
impl for i40e?

[1] https://marc.info/?l=linux-netdev&m=149083338400922&w=2


> In addition i40e isn't really my concern. I am much more
> concerned about ixgbe as it has a much larger install base and many
> more customers that are still buying it today.
>
>>> we still have a much larger install base of ixgbe ports that we have to support.
>>
>> ok, but support is one thing and keep enhancing a ten years old wrong
>> SW model is 2nd thing
>
> The model might be 10 years old, but as I said we are still shipping
> new silicon that was released just over a year ago that is supported
> by the ixgbe driver.
>
> Also I don't know if the term "enhancing" is the right word for what I
> am thinking. I'm not talking about adding new drivers that only
> support legacy mode.  We are looking at probably having to refactor
> the whole concept of "trusted" VF in order to break it out into
> smaller buckets. In addition I plan to come up with a source mode
> macvlan based "port representor" for legacy SR-IOV and hope to be able
> to use that to start working on a better path for SR-IOV live
> migration.
>
> Fundamentally the problem I have with us saying we cannot extend
> legacy mode SR-IOV is that 82599 is a very large piece of the existing
> install base for 10Gbit in general. We have it shipping on brand new
> platforms as the silicon that is installed on the motherboard. With
> that being the case people are going to want to get the most value
> they can out of the silicon that they purchased since in many cases it
> is just a standard part of the platform.

Getting the most value still doesn't mean you should approach the community
and ask to keep enhancing a wrong SW model for a switch.

For example, suppose a single new bit module param to IXGBE will get
you to sell another
100K or 1M or 10M pieces per year but we as community decided that
module params are
not the way to go - will you come and ask to add the module param for
you to get more biz?


> I'm not saying we have new parts. I'm saying we have existing parts
> that will likely need some work done. SwitchDev was only introduced
> about 2 years ago. We have parts that were released around or before
> then with functionality that didn't anticipate this. We still haven't
> finished fully implementing all the features that were available on
> the parts, that is what I am arguing. Usually new features go in for
> several years after a part is released, usually something on the 3 to
> 5 year range.

> When SR-IOV was introduced there were two available modes, Virtual
> Ethernet Port Aggregation, aka VEPA, and Virtual Ethernet Bridging,
> aka VEB. The fact is SwitchDev is designed specifically for networking
> SR-IOV with Virtual Ethernet Bridging, aka VEB. You argue that the
> legacy model is bad, but I would argue that is because the legacy
> model was really designed to work more for both VEPA than with VEB,
> whereas SwitchDev only focuses on VEB. If you take a look in the ixgbe
> or i40e drivers you will see that we support configuring both of those
> modes via ndo_bridge_setlink since we have customer install bases that
> actually prefer VEPA over VEB as they prefer to have their traffic
> centrally managed instead of having the local host managing the
> traffic. We cannot just arbitrarily tell our customers they are doing
> SR-IOV using the "wrong model".
>
> I would rather not have SwitchDev become the next SystemD. The type
> argument you are making is basically dictating to us and our customers
> how things are supposed to work based on your view things. We have
> different hardware, different customers, and all of our needs aren't
> necessarily met by SwitchDev. I would agree that SwitchDev is the
> go-to solution for VEB configuration, and we do plan to have future
> hardware support it. In addition I would argue that for the sake of
> consistency we should make sure that any feature that gets added to
> the legacy has to be supported by the SwitchDev model as well before
> it could be supported. If anything my hope is to evolve the legacy
> model to have much of the same look and feel as SwitchDev, but that
> will take time and require changes to the legacy model.
>
> I don't plan to have a ton of new features added to legacy SR-IOV, as
> I stated earlier my main concern is the "trusted" VF mode as that has
> become a security issue as everything is getting dumped into that so
> we need to break it up to get finer granularity. For example I am
> looking at adding a promisc/allmulti/multicast/broadcast control per
> VF to set the upper limit of what a VF can request to receive instead
> of just turning on "trusted" to allow a VF to turn on promiscuous. My
> only other concern is live migration. I don't know if that will
> require changes to the legacy SR-IOV mode or not, but it would be
> better to not have that door closed as an option than to have to work
> around it entirely.
>
> So, to summarize:
> 1. VEPA is still a thing, that implies no e-switch. Switchdev does not
> address that model.
> 2. I agree that SwitchDev is the way forward for VEB.
> 3. I agree we should focus on interface consistency so any new feature
> added to legacy mode has to also be enabled in SwitchDev.
>
> I hope this makes my point a bit clearer. I don't fundamentally
> disagree with the need to focus on having a consistent UAPI going
> forward. The only spot where we have issues is that I don't see
> SwitchDev as the only solution as we still have customers that aren't
> necessarily making use of an eswitch and telling them they are "doing
> it wrong" isn't really a viable solution. If nothing else I think we
> can look at re-evaluating this at the next netdev/netconf, and for now
> I would agree legacy SR-IOV changes should be under greater scrutiny.

Alex,

Lots of data and argumentation, it's too bad that none of it was
said/presented @ the last
netdev/netconf nor in the previous conferences (Feb 2016 / Oct 2016)
when SRIOV switchdev
was on the stage nor in the submissions that followed, doesn't seem as
new data points, at
least to you. As you said, the switchdev mode for SRIOV is around for
two years (merged in 4.8
but was presented way back). You waited two years to provide this
input and we will have to wait
another 6 months for you to conduct a session on that.

Can you point out public use-cases / white-papers / design documents /
blue prints / etc
that employ the VEPA approach? b/c really no other person/vendor
brought it up... we
were all dealing with the sriov e-switch as a HW switch which should
be programmed
by the host stack according to well known industry models that apply
on physical switches, e.g

1. L2 FDB (Linux Bridge)
2. L3 FIB (Linux Routers)
3. ACLS (Linux TC)

[3] is what implemented by the upstream sriov switchdev drivers, [1] and [2] we
discussed on netdev, maybe you want to play with [1] for i40e? I had a slide on
that in the BoF

Or.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ