netdev - Re: [PATCH net-next 2/4] net: dsa: Link aggregation support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201127162818.GT2073444@lunn.ch>
Date:   Fri, 27 Nov 2020 17:28:18 +0100
From:   Andrew Lunn <andrew@...n.ch>
To:     Tobias Waldekranz <tobias@...dekranz.com>
Cc:     Florian Fainelli <f.fainelli@...il.com>, davem@...emloft.net,
        kuba@...nel.org, vivien.didelot@...il.com, olteanv@...il.com,
        j.vosburgh@...il.com, vfalico@...il.com, andy@...yhouse.net,
        netdev@...r.kernel.org
Subject: Re: [PATCH net-next 2/4] net: dsa: Link aggregation support

> This is a digression, but I really do not get this shift from using
> BUG()s to WARN()s in the kernel when you detect a violated invariant. It
> smells of "On Error Resume Next" to me.

A WARN() gives you a chance to actually use the machine to collect
logs, the kernel dump, etc. You might be able to sync the filesystems,
reducing the damage to the disks.  With BUG(), the machine is
dead. You cannot interact with it, all you have to go on, is what you
can see on the disk, or what you might have logged on the serial
console.

> We have to handle EWHATEVER correctly, no? I do not get what is so
> special about ENOMEM.

Nothing is particularly special about it. But looking at the current
code base the only other error we can get is probably ETIMEDOUT from
an MDIO read/write. But if that happens, there is probably no real
recovery. You have no idea what state the switch is in, and all other
MDIO calls are likely to fail in the same way.

> How would a call to kmalloc have any impact on the stack? (Barring
> exotic bugs in the allocator that would allow the heap to intrude on
> stack memory) Or am I misunderstanding what you mean by "the stack"?

I mean the network stack, top to bottom. Say we have a few vlan
interfaces, on top of the bridge, on top of a LAG, on top of DSA, on
top of IP over avian carriers. When setting up a LAG, what else has
happened by the time you get your ENOMEM? What notifications have
already happened, which some other layer has acted upon? It is not
just LAG inside DSA which needs to unwind, it is all the actions which
have been triggered so far.

The initial design of switchdev was transactions. First there was a
prepare call, where you validated the requested action is possible,
and allocate resources needed, but don't actually do it. This prepare
call is allowed to fail. Then there is a second call to actually do
it, and that call is not allowed to fail. This structure avoids most
of the complexity of the unwind, just free up some resources. If you
never had to allocate the resources in the first place, better still.

      Andrew