lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 Nov 2017 12:00:32 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Or Gerlitz <gerlitz.or@...il.com>
Cc:     David Miller <davem@...emloft.net>,
        Anjali Singhai Jain <anjali.singhai@...el.com>,
        Andy Gospodarek <gospo@...adcom.com>,
        Michael Chan <michael.chan@...adcom.com>,
        Simon Horman <simon.horman@...ronome.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        John Fastabend <john.fastabend@...il.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Jiri Pirko <jiri@...lanox.com>,
        Rony Efraim <ronye@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: SRIOV switchdev mode BoF minutes

On Tue, Nov 14, 2017 at 8:44 AM, Or Gerlitz <gerlitz.or@...il.com> wrote:
> On Mon, Nov 13, 2017 at 7:10 PM, Alexander Duyck
> <alexander.duyck@...il.com> wrote:
>> On Sun, Nov 12, 2017 at 10:16 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>> On Sun, Nov 12, 2017 at 10:38 PM, Alexander Duyck
>
>>> The what we call slow path requirements are the following:
>>>
>>> 1. xmit on VF rep always turns to a receive on the VF, regardless of
>>> the offloaded SW steering rules ("send-to-vport")
>>>
>>> 2. xmit on VF which doesn't meet any offloaded SW steering rules must
>>> be received into the host OS from the VF rep
>
>>> 1,2 above must hold also for the uplink and the PF reps
>
>> I am well aware of the requirements. We discussed these with Jiri at
>> the previous netdev.
>
>>> When the i40e limitation was described to @ netdev, it seems you have a problem
>>> with VF xmit that should be turned to be a recv on the VF rep but also
>>> goes to the wire.
>
>>> It smells as if a FW patch can solve that, isn't that?
>
>> That is a huge maybe. We looked into it last time and while we can
>> meet requirements 1 and 2 we do so with a heavy performance penalty
>> due to the fact that we don't support anywhere near the same number of
>> flows as a true switch. Also while that might work for i40e
>
> to recap on i40e, you can support the slow path requirements, but  you have an
> issue with the fast path (== offloaded flows)? what is the issue there?

We basically need to do some feasability research to see if we can
actually meet all the requirements for switchdev on i40e. We have been
getting mixed messages where we are given a great many "yes, but" type
answers. For i40e we are looking into it but I don't have high
confidence in our ability to actually support it in hardare/firmware.
If it were as easy as you have been led to believe, we would have done
it months ago when we were researching the requirements to support
switchdev. In addition i40e isn't really my concern. I am much more
concerned about ixgbe as it has a much larger install base and many
more customers that are still buying it today.

>> we still have a much larger install base of ixgbe ports that we have to support.
>
> ok, but support is one thing and keep enhancing a ten years old wrong
> SW model is 2nd thing

The model might be 10 years old, but as I said we are still shipping
new silicon that was released just over a year ago that is supported
by the ixgbe driver.

Also I don't know if the term "enhancing" is the right word for what I
am thinking. I'm not talking about adding new drivers that only
support legacy mode.  We are looking at probably having to refactor
the whole concept of "trusted" VF in order to break it out into
smaller buckets. In addition I plan to come up with a source mode
macvlan based "port representor" for legacy SR-IOV and hope to be able
to use that to start working on a better path for SR-IOV live
migration.

Fundamentally the problem I have with us saying we cannot extend
legacy mode SR-IOV is that 82599 is a very large piece of the existing
install base for 10Gbit in general. We have it shipping on brand new
platforms as the silicon that is installed on the motherboard. With
that being the case people are going to want to get the most value
they can out of the silicon that they purchased since in many cases it
is just a standard part of the platform.

>>>>> I would have to disagree with this. For devices such as 82599 that
>>>> doesn't have a true switch this may limit future functionality since
>>>> we can't move it over to switchdev mode. For example one thing I may
>>>> need to add is the ability to disable multicast and broadcast receive
>>>> on a per-VF basis at some point in the future.
>
>>> We are on the same boat with ConnectX3/mlx4, so us lucky that misery loves
>>> company (my google search also yielded "many narrow-half consolation" is that
>>> completely unrelated?) - the legacy mode for ixgbe/mlx4 is there for ~8-10 years
>>> - and since then both companies had 2-3 newer HW generations. I don't see why
>>> you can't come to your customers and tell that newish functionality needs newer
>>> HW - it will also help sell more from the new stuff..  If you keep
>>> extending the legacy mode, more ppl/drivers will do that as well and it will not let us go
>>> in the right direction.
>
>> Well I don't know about you guys, but we still are selling parts
>> supported by ixgbe
>
> Same here, we are selling lots of CX3 and have to support that, but I didn't
> see why someone will want new features there.

I think the difference is that we get pressed on as part of the
platform instead of being a single component. If a customer wants some
specific feature enabled on 82599 as a part of the platform we tend to
need to go along with it in order to avoid being a roadblock in a sale
of other components.

>> still been adding new hardware as recently as just a couple years ago.
>
> wait, that's different story.
>
> You are saying that your older HW doesn't support e-switch
> and you want to keep doing new parts of that older HW and you want the
> kernel to keep enhance a wrong SW model b/c you are doing new parts
> from old HW, I don't see why we as a community need to go there.

I'm not saying we have new parts. I'm saying we have existing parts
that will likely need some work done. SwitchDev was only introduced
about 2 years ago. We have parts that were released around or before
then with functionality that didn't anticipate this. We still haven't
finished fully implementing all the features that were available on
the parts, that is what I am arguing. Usually new features go in for
several years after a part is released, usually something on the 3 to
5 year range.

> Lets focus on this point for a moment before discussing the other points
> you raised.
>
> Or.

When SR-IOV was introduced there were two available modes, Virtual
Ethernet Port Aggregation, aka VEPA, and Virtual Ethernet Bridging,
aka VEB. The fact is SwitchDev is designed specifically for networking
SR-IOV with Virtual Ethernet Bridging, aka VEB. You argue that the
legacy model is bad, but I would argue that is because the legacy
model was really designed to work more for both VEPA than with VEB,
whereas SwitchDev only focuses on VEB. If you take a look in the ixgbe
or i40e drivers you will see that we support configuring both of those
modes via ndo_bridge_setlink since we have customer install bases that
actually prefer VEPA over VEB as they prefer to have their traffic
centrally managed instead of having the local host managing the
traffic. We cannot just arbitrarily tell our customers they are doing
SR-IOV using the "wrong model".

I would rather not have SwitchDev become the next SystemD. The type
argument you are making is basically dictating to us and our customers
how things are supposed to work based on your view things. We have
different hardware, different customers, and all of our needs aren't
necessarily met by SwitchDev. I would agree that SwitchDev is the
go-to solution for VEB configuration, and we do plan to have future
hardware support it. In addition I would argue that for the sake of
consistency we should make sure that any feature that gets added to
the legacy has to be supported by the SwitchDev model as well before
it could be supported. If anything my hope is to evolve the legacy
model to have much of the same look and feel as SwitchDev, but that
will take time and require changes to the legacy model.

I don't plan to have a ton of new features added to legacy SR-IOV, as
I stated earlier my main concern is the "trusted" VF mode as that has
become a security issue as everything is getting dumped into that so
we need to break it up to get finer granularity. For example I am
looking at adding a promisc/allmulti/multicast/broadcast control per
VF to set the upper limit of what a VF can request to receive instead
of just turning on "trusted" to allow a VF to turn on promiscuous. My
only other concern is live migration. I don't know if that will
require changes to the legacy SR-IOV mode or not, but it would be
better to not have that door closed as an option than to have to work
around it entirely.

So, to summarize:
1. VEPA is still a thing, that implies no e-switch. Switchdev does not
address that model.
2. I agree that SwitchDev is the way forward for VEB.
3. I agree we should focus on interface consistency so any new feature
added to legacy mode has to also be enabled in SwitchDev.

I hope this makes my point a bit clearer. I don't fundamentally
disagree with the need to focus on having a consistent UAPI going
forward. The only spot where we have issues is that I don't see
SwitchDev as the only solution as we still have customers that aren't
necessarily making use of an eswitch and telling them they are "doing
it wrong" isn't really a viable solution. If nothing else I think we
can look at re-evaluating this at the next netdev/netconf, and for now
I would agree legacy SR-IOV changes should be under greater scrutiny.

- Alex

Powered by blists - more mailing lists