netdev - Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201203215725.uuptum4qhcwvhb6l@skbuf>
Date:   Thu, 3 Dec 2020 23:57:25 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     Tobias Waldekranz <tobias@...dekranz.com>
Cc:     davem@...emloft.net, kuba@...nel.org, andrew@...n.ch,
        vivien.didelot@...il.com, f.fainelli@...il.com,
        j.vosburgh@...il.com, vfalico@...il.com, andy@...yhouse.net,
        netdev@...r.kernel.org
Subject: Re: [PATCH v3 net-next 2/4] net: dsa: Link aggregation support

On Thu, Dec 03, 2020 at 09:53:08PM +0100, Tobias Waldekranz wrote:
> > I like the fact that we revert to a software-based implementation for
> > features the hardware can't offload. So rejecting other TX types is out
> > of the question.
>
> Well if we really want to be precise, we must also ensure that the exact
> hash type is supported by the hardware. mv88e6xxx only supports
> NETDEV_LAG_HASH_L2 for example. There is a needle to thread here I
> think. Story time ('tis the season after all):

Fine, I'll go along with the story-telling mood, even if that'll make my
message less than concise or informative.

>
>     A user, Linus, has just installed OpenWRT on his gateway. Finally,
>     he can unlock the full potential of his 802.11 AP by setting up a
>     LAG to it.
>
>     He carefully studies teamd.conf(5) and rightfully comes to the
>     conclusion that he should set up the tx_hash to include the full
>     monty of available keys. Teamd gets nothing but praise from the
>     kernel when applying the configuration.
>
>     And yet, Linus is not happy - the throughput between his NAS and his
>     smart TV is now lower than before. It is enough for Linus to start
>     working on his OS. It won't be big and professional like Linux of
>     course, but it will at least get this bit right.
>
> One could argue that if Linus had received an error instead, adapted his
> teamd config and tried again, he would be a happier user and we might
> not have to compete with his OS.
>
> I am not sure which way is the correct one, but I do not think that it
> necessarily _always_ correct to silently fallback to a non-offloaded
> mode.

Of course, neither is fully correct. There is always more to improve on
the communication side of things. But DSA, which stands for "Not Just A
Switch", promises you, above all, a port multiplexer with the ability to
make full use of the network stack. Maybe I'm still under the influence
of a recent customer meeting I had, but there is a large base of use
cases there, where people just don't seem to have enough ports, and they
can just add a $3 switch to their system, and voila, problem solved.
Everything works absolutely the same as before. The newly added switch
ports are completely integrated with the kernel's IP routing database.
They aren't even switch ports, they're just ports.

And even there, on the software fallback front, we don't do enough,
currently. I've had other customers who said that they even _prefer_
to do bridging in software, in order to get a chance to run their
netfilter/ebtables based firewall on their traffic. Of course, DSA
currently has no idea when netfilter rules are installed, so I just told
them to hack the driver and remove the bridging offload, which they
happily did.

I suspect you're looking at this from the wrong angle. I did, too, for
the longest time, because I was focused on squeezing the most out of the
hardware I had. And "I feel cheated because the operating system sits
between me and the performance I want from my hardware" is an argument
I've heard all too often. But not everybody needs even gigabits of
traffic, or absurdly low latency. Making a product that works acceptably
and at a low cost is better than hand-picking only hardware that can
accelerate everything and spending $$$$ on it, for a performance
improvement that no user will notice. Doing link aggregation in software
is fine. Doing bridging in software is fine. Sure, the CPU can't compete
past a number of KPPS, but maybe it doesn't even have to.

Again, this doesn't mean that we should not report, somehow, what is
offloaded and what isn't, and more importantly, the reason why
offloading the set of required actions is not supported. And I do
realize that I wrote a long rant about something that has nothing to do
with the topic at hand, which is: should we deny bonding interfaces that
use balance-rr, active-backup, broadcast, balance-tlb, balance-alb on
top of a DSA interface? Hell no, you wouldn't expect an intel e1000 card
to error out when you would set up a bonding interface that isn't xor or
802.3ad, would you? But you wouldn't go ahead and set up bridging
offload either, therefore running into the problem I raised above with
netfilter rules. You would just set out to do what the user asked for,
in the way that you can, and let them decide if the performance is what
they need or not.

> > Should we add an array of supported TX types that the switch port can
> > offload, and that should be checked by DSA in dsa_lag_offloading?
>
> That would work. We could also create a new DSA op that we would call
> for each chip from PRECHANGEUPPER to verify that it is supported. One
> advantage with this approach is that we can just pass along the `struct
> netdev_lag_upper_info` so drivers always have access to all information,
> in the event that new fields are added to it for example.

Hmm, not super pleased to have yet another function pointer, but I don't
have any other idea, so I guess that works just fine.
I would even go out on a limb and say hardcode the TX_TYPE_HASH in DSA
for now. I would be completely surprised to see hardware that can
offload anything else in the near future.