netdev - Re: [PATCH net-next 2/2] mv88e6131: bonding: implement single device trunking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <54FB360F.9060901@gmail.com>
Date:	Sat, 07 Mar 2015 09:31:59 -0800
From:	John Fastabend <john.fastabend@...il.com>
To:	Jiri Pirko <jiri@...nulli.us>
CC:	Scott Feldman <sfeldma@...il.com>, Andrew Lunn <andrew@...n.ch>,
	Florian Fainelli <f.fainelli@...il.com>,
	Jonas Johansson <jonasj76@...il.com>,
	Netdev <netdev@...r.kernel.org>,
	Jonas Johansson <jonas.johansson@...termo.se>
Subject: Re: [PATCH net-next 2/2] mv88e6131: bonding: implement single device
 trunking

On 03/07/2015 06:38 AM, Jiri Pirko wrote:
> Fri, Mar 06, 2015 at 11:43:55PM CET, sfeldma@...il.com wrote:
>> On Fri, Mar 6, 2015 at 1:47 PM, Andrew Lunn <andrew@...n.ch> wrote:
>>> Hi Florian
>>>
>>>> Most Broadcom switches, either SF2 or roboswitch (b53) have a limit of 2
>>>> trunking groups, without limitations on the number of ports included in
>>>> any of these two groups.
>>>
>>> O.K, so maybe we want the basic resource management in the DSA layer,
>>> not the switch drivers.
>>>
>>>> The larger question is once we start advertising capabilities, where
>>>> does that stop, right? It would probably be simpler for now to e.g:
>>>> allow 2 trunking groups to be configured, and when trying to configure a
>>>> 3rd one, return -ENOSPC and act upon that to either take the software
>>>> slow path (which is probably not possible) or just return a hard error
>>>> condition.
>>>
>>> This is more than a DSA question. It applies to all the hardware
>>> acceleration being discussed at the moment. As you hinted to above, i
>>> suppose we have two different situations:
>>>
>>> 1) We can fall back to a software slow path.
>>>
>>> 2) There is no software fallback, we have to error out, and it would
>>>     be nice to have a well defined error code for out of hardware
>>>     resources.
>>>

I have a (3) case where we don't ever want to fall back to slow path
even if it is possible and (4) we don't ever want to offload the
operation even when it is possible. Having to program the device then
unwind it from user space seems a bit error prone and ugly. At some
point I think we will have to handle these cases its probably fine to
get the transparent offload working as a start though. Specifically,
talking about LAG I'm not sure of the use case for (4) but in
general it is useful.

>>> We also should think about how we tell user space we have fallen back
>>> to a slow path. I'm sure users want to know why it works, but works
>>> much slower.
>>
>> For the general hardware acceleration of bonds, my thoughts for switchdev are:
>>
>> (Assume 802.3ad LAG setup for discussion sake, other bonding modes are similar).
>>
>> 1. The driver has access to port membership notification via netevent,
>> so driver knows which ports are in which bonds.  This notification is
>> passive; there is no way to signal to user that when port was put in a
>> bond if it was programmed into a device LAG group or not.  It's
>> totally up to the driver and device resources to make that decision.
>
> Exactly. The fact if another group can be added or not should be decided
> and handled by driver. The driver then notifies higher layers using
> switchdev notifier event.

hmm same point I need to be able to know ahead of time if it is going to
succeed or fail. I think we can extend this in two ways one add an
explicit flag to say 'add this to hardware or fail' or 'never offload
this' although for LAG setup this might be challenging because the
notification is passive as noted. The other way to do this is to
sufficient exposure of the hardware model and resources so software
can predict a priori that the LAG setup will be accelerated. I think we
need both...

My only point here is eventually management of the system is challenging
if its entirely a driver decision at least in many cases where we have
intelligent agents managing the network. But sure get the simple case
working as a start.

>
>>
>> 2. The driver can know port active status via netevent as well.  If
>> device has the port in a LAG group, then reflect the active status
>> down to device port.  Again, a passive notification.
>>
>> 3. CPU-originating egress traffic would be LAGed by bonding (or team)
>> driver.  (This is true regardless if the device LAG group was setup or
>> not).
>>
>> 4a. If device LAG group setup, forwarded traffic would be LAG egressed
>> by device (fast path).
>>
>> 4b. If no device LAG group, ingress traffic is sent to CPU (slow path)
>> for bonding (or team) to LAG egress.
>>
>> So software fall-back (4b) is the default case if driver/device can't
>> setup LAG group for forwarding.
>>
>> So the question is: how does user know which bonds are accelerated?
>> So far, we've used the label "external" to mark L2 bridge FDB which
>> are offloaded to the device and "external" to mark L3 routes offloaded
>> to the device.  Do we mark bonds as "external" if the LAG path is
>> offloaded to the device?
>
> I believe that this is the way to go. Introduce this flag for bond and
> team to signal user if LAG is offloaded or not.
>
>
>>
>> -scott
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html