netdev - Re: [RFC PATCH net-next 00/10] Use robust notifiers in DSA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yv6z5HTyenpJ+pex@lunn.ch>
Date:   Thu, 18 Aug 2022 23:49:24 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Vladimir Oltean <vladimir.oltean@....com>
Cc:     netdev@...r.kernel.org, Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vladimir Oltean <olteanv@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Kevin Hilman <khilman@...nel.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [RFC PATCH net-next 00/10] Use robust notifiers in DSA

> I am posting this as RFC because something still feels off, but I can't
> exactly pinpoint what, and I'm looking for some feedback. Since most DSA
> switches are behind I/O protocols that can fail or time out (SPI, I2C,
> MDIO etc), everything can fail; that's a fact. On the other hand, when
> a network device or the entire system is torn down, nobody cares that
> SPI I/O failed - the system is still shutting down; that is also a fact.
> I'm not quite sure how to reconcile the two. On one hand we're
> suppressing errors emitted by DSA drivers in the non-robust form of
> notifiers, and on the other hand there's nothing we can do about them
> either way (upper layers don't necessarily care).

I would split it into two classes of errors:

Bus transactions fail. This very likely means the hardware design is
bad, connectors are loose, etc. There is not much we can do about
this, bad things are going to happen no what.

We have consumed all of some sort of resource. Out of memory, the ATU
is full, too many LAGs, etc. We try to roll back in order to get out
of this resource problem.

So i would say -EIO, -ETIMEDOUT, we don't care about too
much. -ENOMEM, -ENOBUF, -EOPNOTSUPP or whatever, we should try to do a
robust rollback.

The original design of switchdev was two phase:

1) Allocate whatever resources are needed, can fail
2) Put those resources into use, must not fail

At some point that all got thrown away.

	Andrew