[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180314095027.GC2130@nanopsycho>
Date: Wed, 14 Mar 2018 10:50:27 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Or Gerlitz <gerlitz.or@...il.com>
Cc: Jiri Pirko <jiri@...lanox.com>, Rabie Loulou <rabiel@...lanox.com>,
John Hurley <john.hurley@...ronome.com>,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
Simon Horman <simon.horman@...ronome.com>,
Linux Netdev List <netdev@...r.kernel.org>,
ASAP_Direct_Dev@...lanox.com, mlxsw <mlxsw@...lanox.com>
Subject: Re: [RFC net-next 2/6] driver: net: bonding: allow registration of
tc offload callbacks in bond
Tue, Mar 13, 2018 at 04:51:02PM CET, gerlitz.or@...il.com wrote:
>On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hurley@...ronome.com wrote:
>>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>>If a netdev has registered and is a slave of a given bond, then any tc
>>>rules offloaded to the bond will be relayed to it if both the bond and the
>>>slave permit hw offload.
>
>>>Because the bond itself is not offloaded, just the rules, we don't care
>>>about whether the bond ports are on the same device or whether some of
>>>slaves are representor ports and some are not.
>
>John, I think we must design here for the case where the bond IS offloaded.
>E.g some sort of HW LAG. For example, the mlxsw driver supports
>LAG offload and support tcflower offload, we need to see how these
>two live together, mlx5 supports tcflower offload and we are working on
>bond offload, etc.
>
>>>+EXPORT_SYMBOL_GPL(tc_setup_cb_bond_register);
>>
>> Please, no "bond" specific calls from drivers. That would be wrong.
>> The idea behing block callbacks was that anyone who is interested could
>> register to receive those. In this case, slave device is interested.
>> So it should register to receive block callbacks in the same way as if
>> the block was directly on top of the slave device. The only thing you
>> need to handle is to propagate block bind/unbind from master down to the
>> slaves.
>
>Jiri,
>
>This sounds nice for the case where one install ingress tc rules on
>the bond (lets
>call them type 1, see next)
>
>One obstacle pointed by my colleague, Rabie, is that when the upper layer
>issues stat call on the filter, they will get two replies, this can confuse them
>and lead to wrong decisions (aging). I wonder if/how we can set a knob
The bonding itself would not do anything on stats update
command (TC_CLSFLOWER_STATS for example). Only the slaves would do
update. So there will be only reply from slaves.
Bond/team is just going to probagare block bind/unbind down. Nothing
else.
>somewhere that unifies the stats (add packet/bytes, use the latest lastuse).
>
>Also, lets see what other rules have to be offloaded in that scheme
>(call them type 2/3/4)
>where one bonded two HW ports
>
>2. bond being egress port of a rule
>
>TC rules for overlay networks scheme, e.g in NIC SRIOV
>scheme where one bonds the two uplink representors
>
>Starting with type 2, in our current NIC HW APIs we have to duplicate
>these rules
>into two rules set to HW:
>
>2.1 VF rep --> uplink 0
>2.2 VF rep --> uplink 1
>
>and we do that in the driver (add/del two HW rules, combine the stat
>results, etc)
That is up to the driver. If the driver can share block between 2
devices, he can do that. If he cannot share, it will just report stats
for every device separatelly (2 block cbs registered) and tc will see
them both together. No need to do anything in driver.
>
>3. ingress rule on VF rep port with shared tunnel device being the
>egress (encap)
>and where the routing of the underlay (tunnel) goes through LAG.
>
>in our case, this is like 2.1/2.2 above, offload two rules, combine stats
>
Same as "2."
>4. ingress rule shared tunnel device being the ingress and VF rep port
>being the egress (decap)
I don't follow :(
>
>this uses the egdev facility to be offloaded into the our driver, and
>then in the driver
>we will treat it like type 1, two rules need to be installed into HW,
>but now, we can't delegate them
>from the vxlan device b/c it has no direct connection with the bond.
I see another thing we need to sanitize: vxlan rule ingress match action
mirred redirect to lag
>
>All to all, for the mlx5 use case, seems we have elegant solution only
>for type 1.
>
>I think we should do the elegant solution for the case where it applicable.
>
>In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
>using one HW rule whose dest is the bond, we are good. As for type 4,
>need to see
>if/how it can be nicer.
>
>Or.
Powered by blists - more mailing lists