netdev - Re: [PATCH net-next 2/4] net: dsa: Link aggregation support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87lfem8k2b.fsf@waldekranz.com>
Date:   Sat, 28 Nov 2020 00:19:08 +0100
From:   Tobias Waldekranz <tobias@...dekranz.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Florian Fainelli <f.fainelli@...il.com>, davem@...emloft.net,
        kuba@...nel.org, vivien.didelot@...il.com, olteanv@...il.com,
        j.vosburgh@...il.com, vfalico@...il.com, andy@...yhouse.net,
        netdev@...r.kernel.org
Subject: Re: [PATCH net-next 2/4] net: dsa: Link aggregation support

On Fri, Nov 27, 2020 at 17:28, Andrew Lunn <andrew@...n.ch> wrote:
>> This is a digression, but I really do not get this shift from using
>> BUG()s to WARN()s in the kernel when you detect a violated invariant. It
>> smells of "On Error Resume Next" to me.
>
> A WARN() gives you a chance to actually use the machine to collect
> logs, the kernel dump, etc. You might be able to sync the filesystems,

I completely agree that collecting the logs is very important. But I
think there are ways of solving that problem that works for both BUG()
and WARN(), e.g. pstore or mtdoops.

> reducing the damage to the disks.  With BUG(), the machine is

Well, you are betting that whatever bug you just discovered has not
affected the VFS in any way, causing you to sync back bad data to the
disk. I would rather take a hard reboot and rely on the filesystem's
journal to bring it up in a consistent state.

> dead. You cannot interact with it, all you have to go on, is what you
> can see on the disk, or what you might have logged on the serial
> console.

I guess we are coming at this problem from different angles. For me it
is not OK to just leave the device in limp-mode until an administrator
can inspect it and perform a reboot, as that can take weeks in some
cases. I would much rather have a hard reboot, bring the system back to
a fully operational state, detect the panic during the next boot,
prepare a debug tarball with the pstore data, and signal an alarm of
some kind.

Agree to disagree? :)

>> We have to handle EWHATEVER correctly, no? I do not get what is so
>> special about ENOMEM.
>
> Nothing is particularly special about it. But looking at the current
> code base the only other error we can get is probably ETIMEDOUT from
> an MDIO read/write. But if that happens, there is probably no real
> recovery. You have no idea what state the switch is in, and all other
> MDIO calls are likely to fail in the same way.
>
>> How would a call to kmalloc have any impact on the stack? (Barring
>> exotic bugs in the allocator that would allow the heap to intrude on
>> stack memory) Or am I misunderstanding what you mean by "the stack"?
>
> I mean the network stack, top to bottom. Say we have a few vlan
> interfaces, on top of the bridge, on top of a LAG, on top of DSA, on
> top of IP over avian carriers. When setting up a LAG, what else has
> happened by the time you get your ENOMEM? What notifications have
> already happened, which some other layer has acted upon? It is not
> just LAG inside DSA which needs to unwind, it is all the actions which
> have been triggered so far.
>
> The initial design of switchdev was transactions. First there was a
> prepare call, where you validated the requested action is possible,
> and allocate resources needed, but don't actually do it. This prepare
> call is allowed to fail. Then there is a second call to actually do
> it, and that call is not allowed to fail. This structure avoids most
> of the complexity of the unwind, just free up some resources. If you
> never had to allocate the resources in the first place, better still.

OK I think I finally see what you are saying. Sorry it took me this
long. I do not mean to be difficult, I just want to understand.

How about this:

- Add a `lags_max` field to `struct dsa_switch` to let each driver
  define the maximum number supported by the hardware. By default this
  would be zero, which would mean that LAG offloading is not supported.

- In dsa_tree_setup we allocate a static array of the minimum supported
  number across the entire tree.

- When joining a new LAG, we ensure that a slot is available in
  NETDEV_PRECHANGEUPPER, avoiding the issue you are describing.

- In NETDEV_CHANGEUPPER, we actually mark it as busy and start using it.