lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 13 Nov 2017 09:10:10 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Or Gerlitz <gerlitz.or@...il.com>
Cc:     David Miller <davem@...emloft.net>,
        Anjali Singhai Jain <anjali.singhai@...el.com>,
        Andy Gospodarek <gospo@...adcom.com>,
        Michael Chan <michael.chan@...adcom.com>,
        Simon Horman <simon.horman@...ronome.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        John Fastabend <john.fastabend@...il.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Jiri Pirko <jiri@...lanox.com>,
        Rony Efraim <ronye@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: SRIOV switchdev mode BoF minutes

On Sun, Nov 12, 2017 at 10:16 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
> On Sun, Nov 12, 2017 at 10:38 PM, Alexander Duyck
> <alexander.duyck@...il.com> wrote:
>> On Sun, Nov 12, 2017 at 11:49 AM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>> Hi Dave and all,
>>>
>>> During and after the BoF on SRIOV switchdev mode, we came into a
>>> consensus among the developers from four different HW vendors (CC
>>> audience) that a correct thing to do would be to disallow any new
>>> extensions to the legacy mode.
>>>
>>> The idea is to put focus on the new mode and not add new UAPIs and
>>> kernel code which was turned to be a wrong design which does not allow
>>> for properly offloading a kernel switching SW model to e-switch HW.
>
>> You may not recall but we tried to transition the i40e driver over to
>> SwitchDev, the parts supported by i40e have a much more robust l2
>> forwarding framework than the 82599, and the result was we were told
>> that while we might look at doing port representors some other way,
>> there was no way we could use switchdev since the hardware couldn't
>> support the requirements of switchdev in terms of default routes and
>> forwarding behavior. I am planning to resolve the port representor
>> issue by looking at coming up with something like a "source mode"
>> macvlan based port representor. I figure that is probably the closest
>> match for what the Intel hardware does since really the VFs are
>> nothing more than a physical macvlan interface in and of themselves as
>> the hardware doesn't have a full switch.
>
> Hi Alex,
>
> The what we call slow path requirements are the following:
>
> 1. xmit on VF rep always turns to a receive on the VF, regardless of
> the offloaded
> SW steering rules ("send-to-vport")
>
> 2. xmit on VF which doesn't meet any offloaded SW steering rules must
> be recieved
> into the host OS from the VF rep
>
> 1,2 above must hold also for the uplink and the PF reps

I am well aware of the requirements. We discussed these with Jiri at
the previous netdev.

> When the i40e limitation was described to @ netdev, it seems you have a problem
> with VF xmit that should be turned to be a recv on the VF rep but also
> goes to the wire.
>
> It smells as if a FW patch can solve that, isn't that?

That is a huge maybe. We looked into it last time and while we can
meet requirements 1 and 2 we do so with a heavy performance penalty
due to the fact that we don't support anywhere near the same number of
flows as a true switch. Also while that might work for i40e we still
have a much larger install base of ixgbe ports that we have to
support.

>> I would have to disagree with this. For devices such as 82599 that
>> doesn't have a true switch this may limit future functionality since
>> we can't move it over to switchdev mode. For example one thing I may
>> need to add is the ability to disable multicast and broadcast receive
>> on a per-VF basis at some point in the future.
>
> We are on the same boat with ConnectX3/mlx4, so us lucky that misery loves
> company (my google search also yielded "many narrow-half consolation" is that
> completely unrelated?) - the legacy mode for ixgbe/mlx4 is there for ~8-10 years
> - and since then both companies had 2-3 newer HW generations. I don't see why
> you can't come to your customers and tell that newish functionality needs newer
> HW - it will also help sell more from the new stuff..  If you keep
> extending the legacy
> mode, more ppl/drivers will do that as well and it will not let us go
> in the right direction.
>
> Or.

Well I don't know about you guys, but we still are selling parts
supported by ixgbe and have still been adding new hardware as recently
as just a couple years ago. I'm not saying SwitchDev doesn't need to
be supported, if anything I am saying we need to leave the legacy
support extendable so that we can setup a glide path between the two.
If I can get the souce mode macvlan port representor working the way I
hope we can start looking at getting our customers used to a SwitchDev
type environment without having to use full SwitchDev. That would help
to make them more amenable to moving over to devices that support that
in the future.

In addition this all works on the basis of all future SR-IOV devices
being based on a VEB. Do we know if there are any existing or future
devices that work in a VEPA type mode? The issue with ixgbe and i40e
is that they were designed to be a hybrid between the two but in my
opinion they lean much more toward the VEPA configuration with just a
little bit of loopback support to make the VEB setup work. As such we
end up with issues such as all broadcasts/multicasts always being
transmitted out the uplink port.

If anything I think what we should define as a requirement would be
that we cannot add any future legacy items without adding support for
the same via the SwitchDev port representor.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ