netdev - Re: Fwd: [PATCH 0/6] offload Linux LAG devices to the TC datapath

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK+XE==PXr2pcss8ZGz2-Urp694ym0EpCNo368dg2CJKQ34_2g@mail.gmail.com>
Date:   Tue, 26 Jun 2018 19:16:45 +0100
From:   John Hurley <john.hurley@...ronome.com>
To:     Or Gerlitz <ogerlitz@...lanox.com>
Cc:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jiri Pirko <jiri@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Simon Horman <simon.horman@...ronome.com>,
        Andy Gospodarek <gospo@...adcom.com>
Subject: Re: Fwd: [PATCH 0/6] offload Linux LAG devices to the TC datapath

On Tue, Jun 26, 2018 at 3:57 PM, Or Gerlitz <ogerlitz@...lanox.com> wrote:
>> -------- Forwarded Message --------
>> Subject: [PATCH 0/6] offload Linux LAG devices to the TC datapath
>> Date: Thu, 21 Jun 2018 14:35:55 +0100
>> From: John Hurley <john.hurley@...ronome.com>
>> To: dev@...nvswitch.org, roid@...lanox.com, gavi@...lanox.com, paulb@...lanox.com, fbl@...close.org, simon.horman@...ronome.com
>> CC: John Hurley <john.hurley@...ronome.com>
>>
>> This patchset extends OvS TC and the linux-netdev implementation to
>> support the offloading of Linux Link Aggregation devices (LAG) and their
>> slaves. TC blocks are used to provide this offload. Blocks, in TC, group
>> together a series of qdiscs. If a filter is added to one of these qdiscs
>> then it applied to all. Similarly, if a packet is matched on one of the
>> grouped qdiscs then the stats for the entire block are increased. The
>> basis of the LAG offload is that the LAG master (attached to the OvS
>> bridge) and slaves that may exist outside of OvS are all added to the same
>> TC block. OvS can then control the filters and collect the stats on the
>> slaves via its interaction with the LAG master.
>>
>> The TC API is extended within OvS to allow the addition of a block id to
>> ingress qdisc adds. Block ids are then assigned to each LAG master that is
>> attached to the OvS bridge. The linux netdev netlink socket is used to
>> monitor slave devices. If a LAG slave is found whose master is on the bridge
>> then it is added to the same block as its master. If the underlying slaves
>> belong to an offloadable device then the Linux LAG device can be offloaded
>> to hardware.
>
> Guys (J/J/J),
>
> Doing this here b/c
>
> a. this has impact on the kernel side of things
>
> b. I am more of a netdev and not openvswitch citizen..
>
> some comments,
>
> 1. this + Jakub's patch for the reply are really a great design
>
> 2. re the egress side of things. Some NIC HWs can't just use LAG
> as the egress port destination of an ACL (tc rule) and the HW rule
> needs to be duplicated to both HW ports. So... in that case, you
> see the HW driver doing the duplication (:() or we can somehow
> make it happen from user-space?
>

Hi Or,
I'm not sure how rule duplication would work for rules that egress to
a LAG device.
Perhaps this could be done for an active/backup mode where user-space
adds a rule to 1 port and deletes from another as appropriate.
For load balancing modes where the egress port is selected based on a
hash of packet fields, it would be a lot more complicated.
OvS can do this with its own bonds as far as I'm aware but (if
recirculation is turned off) it basically creates exact match datapath
entries for each packet flow.
Perhaps I do not fully understand your question?

> 3. for the case of overlay networks, e.g OVS based vxlan tunnel, the
> ingress (decap) rule is set on the vxlan device. Jakub, you mentioned
> a possible kernel patch to the HW (nfp, mlx5) drivers to have them bind
> to the tunnel device for ingress rules. If we have agreed way to identify
> uplink representors, can we do that from ovs too? does it matter if we are
> bonding + encapsulating or just encapsulating? note that under encap scheme
> the bond is typically not part of the OVS bridge.
>

If we have a way to bind the HW drivers to tunnel devs for ingress
rules then this should work fine with OvS (possibly requiring a small
patch - Id need to check).

In terms of bonding + encap this probably needs to be handled in the
hw itself for the same reason I mentioned in point 2.

> Or.