[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ3xEMhOHOXCUG-0d6SdFoRvYUdmYGBrAQjoOAnxSsucx86LzQ@mail.gmail.com>
Date: Tue, 13 Mar 2018 17:53:39 +0200
From: Or Gerlitz <gerlitz.or@...il.com>
To: Jiri Pirko <jiri@...lanox.com>, Rabie Loulou <rabiel@...lanox.com>,
John Hurley <john.hurley@...ronome.com>
Cc: Jakub Kicinski <jakub.kicinski@...ronome.com>,
Simon Horman <simon.horman@...ronome.com>,
Linux Netdev List <netdev@...r.kernel.org>,
mlxsw <mlxsw@...lanox.com>,
Yevgeny Kliteynik <kliteyn@...lanox.com>,
Paul Blakey <paulb@...lanox.com>
Subject: Re: [RFC net-next 2/6] driver: net: bonding: allow registration of tc
offload callbacks in bond
On Tue, Mar 13, 2018 at 5:51 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
Sorry ppl, I added MLNX alias (ASAP_Direct_Dev@...lanox.com) which is
not open to outer posts,
please remove it from your replies, otherwise it will bump you back.. Or.
> On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hurley@...ronome.com wrote:
>>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>>If a netdev has registered and is a slave of a given bond, then any tc
>>>rules offloaded to the bond will be relayed to it if both the bond and the
>>>slave permit hw offload.
>
>>>Because the bond itself is not offloaded, just the rules, we don't care
>>>about whether the bond ports are on the same device or whether some of
>>>slaves are representor ports and some are not.
>
> John, I think we must design here for the case where the bond IS offloaded.
> E.g some sort of HW LAG. For example, the mlxsw driver supports
> LAG offload and support tcflower offload, we need to see how these
> two live together, mlx5 supports tcflower offload and we are working on
> bond offload, etc.
>
>>>+EXPORT_SYMBOL_GPL(tc_setup_cb_bond_register);
>>
>> Please, no "bond" specific calls from drivers. That would be wrong.
>> The idea behing block callbacks was that anyone who is interested could
>> register to receive those. In this case, slave device is interested.
>> So it should register to receive block callbacks in the same way as if
>> the block was directly on top of the slave device. The only thing you
>> need to handle is to propagate block bind/unbind from master down to the
>> slaves.
>
> Jiri,
>
> This sounds nice for the case where one install ingress tc rules on
> the bond (lets
> call them type 1, see next)
>
> One obstacle pointed by my colleague, Rabie, is that when the upper layer
> issues stat call on the filter, they will get two replies, this can confuse them
> and lead to wrong decisions (aging). I wonder if/how we can set a knob
> somewhere that unifies the stats (add packet/bytes, use the latest lastuse).
>
> Also, lets see what other rules have to be offloaded in that scheme
> (call them type 2/3/4)
> where one bonded two HW ports
>
> 2. bond being egress port of a rule
>
> TC rules for overlay networks scheme, e.g in NIC SRIOV
> scheme where one bonds the two uplink representors
>
> Starting with type 2, in our current NIC HW APIs we have to duplicate
> these rules
> into two rules set to HW:
>
> 2.1 VF rep --> uplink 0
> 2.2 VF rep --> uplink 1
>
> and we do that in the driver (add/del two HW rules, combine the stat
> results, etc)
>
> 3. ingress rule on VF rep port with shared tunnel device being the
> egress (encap)
> and where the routing of the underlay (tunnel) goes through LAG.
>
> in our case, this is like 2.1/2.2 above, offload two rules, combine stats
>
> 4. ingress rule shared tunnel device being the ingress and VF rep port
> being the egress (decap)
>
> this uses the egdev facility to be offloaded into the our driver, and
> then in the driver
> we will treat it like type 1, two rules need to be installed into HW,
> but now, we can't delegate them
> from the vxlan device b/c it has no direct connection with the bond.
>
> All to all, for the mlx5 use case, seems we have elegant solution only
> for type 1.
>
> I think we should do the elegant solution for the case where it applicable.
>
> In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
> using one HW rule whose dest is the bond, we are good. As for type 4,
> need to see
> if/how it can be nicer.
>
> Or.
Powered by blists - more mailing lists