[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ3xEMhZ8uF4ohrQUGUHtVxzKUeBU7ejZqYAHw+HtkQ5voeGVg@mail.gmail.com>
Date: Thu, 28 Jun 2018 06:50:32 +0300
From: Or Gerlitz <gerlitz.or@...il.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: Or Gerlitz <ogerlitz@...lanox.com>,
John Hurley <john.hurley@...ronome.com>,
Jiri Pirko <jiri@...lanox.com>,
Linux Netdev List <netdev@...r.kernel.org>,
ASAP_Direct_Dev <ASAP_Direct_Dev@...lanox.com>,
Simon Horman <simon.horman@...ronome.com>,
Andy Gospodarek <gospo@...adcom.com>
Subject: Re: [PATCH 0/6] offload Linux LAG devices to the TC datapath
On Thu, Jun 28, 2018 at 2:08 AM, Jakub Kicinski
<jakub.kicinski@...ronome.com> wrote:
> On Wed, 27 Jun 2018 23:07:29 +0300, Or Gerlitz wrote:
>> On Wed, Jun 27, 2018 at 1:31 AM, Jakub Kicinski
>> <jakub.kicinski@...ronome.com> wrote:
>> > On Tue, 26 Jun 2018 17:57:08 +0300, Or Gerlitz wrote:
>>
>> >> 2. re the egress side of things. Some NIC HWs can't just use LAG
>> >> as the egress port destination of an ACL (tc rule) and the HW rule
>> >> needs to be duplicated to both HW ports. So... in that case, you
>> >> see the HW driver doing the duplication (:() or we can somehow
>> >> make it happen from user-space?
>>
>> > It's the TC core that does the duplication. Drivers which don't need
>> > the duplication (e.g. mlxsw) will not register a new callback for each
>> > port on which shared block is bound. They will keep one list of rules,
>> > and a list of ports that those rules apply to.
>>
>> [snip]
>>
>> > Drivers which need duplication (multiplication) (all NICs?) have to
>> > register a new callback for each port bound to a shared block. And TC
>> > will call those drivers as many times as they have callbacks registered
>> > == as many times as they have ports bound to the block. Each time
>> > callback is invoked the driver will figure out the ingress port based
>> > on the cb_priv and use <ingress, cookie> as the key in its rule table
>> > (or have a separate rule table per ingress port).
>>
>> [snip snip]
>>
>> > I may be wrong, but I think you split the rules tables per port for mlx5
>>
>> correct, currently I have a rule table per physical port.
>>
>> > So again you just register a callback every time shared block is bound,
>> > and then TC core will send add/remove rule commands down to the driver,
>> > relaying existing rules as well if needed.
>>
>> Let's see, the NIC uplink rep port devices were bounded (say) by ovs to
>> a shared-block because they are the lower devices (hate the slavish jargon)
>> of a bond device.
>>
>> Next, the TC stack will invoke the callback over these ports, when ingress
>> rule is added on the bond.
>>
>> But we are talking on ingress rule set on a non-uplink rep (VF rep) port,
>> where bonding is the egress of the rule. I guess the callback which you probably
>> refer to (you hinted there below) is the egdev one, correct? you are suggesting
>> that bonding will do egdev registration... I am a bit confused.
>
> Ah, you really meant egress. We don't have this problem, but yes, I
so how does it works for you -- the rule is:
<ingress=vfrep netdev, egress=bond netdev>
so from here, your driver logic does what inorder
to allow offloading into the lagged uplinks? can you
point the code please..
the bond BTW doesn't have the same switchdev id as
the vfrep in case you keep different switchdev id's
for the uplink reps under bonding -- do you unite them?
Powered by blists - more mailing lists