netdev - Re: [PATCH net-next v14 5/7] net: dsa: mv88e6xxx: rmu: Add functionality to get RMON

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <YyybG/MsUIUji4UH@lunn.ch>
Date:   Thu, 22 Sep 2022 19:27:55 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Mattias Forsblad <mattias.forsblad@...il.com>,
        netdev@...r.kernel.org, Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, linux@...linux.org.uk,
        ansuelsmth@...il.com
Subject: Re: [PATCH net-next v14 5/7] net: dsa: mv88e6xxx: rmu: Add
 functionality to get RMON

> > I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a
> > buffer for building requests. Each write call appends the write to the
> > buffer and returns 0. A read call gets appended to the buffer and then
> > executes the RMU. We probably also need to wrap the reg mutex, so that
> > when it is released, any buffered writes get executed. If the RMU
> > fails, we have all the information needed to do the same via MDIO.
> 
> Ah, so you want to make the mv88e6xxx_reg_unlock() become an implicit
> write barrier.

I'm still thinking this through. It probably needs real code to get
all the details sorted out. The locking could be interesting...  We
need something to flush the queue before we exit from the driver, and
that seems like the obvious synchronisation point. If not that, we
need to add another function call at all the exit points.

> That could work, but the trouble seems to be error propagation.
> mv88e6xxx_write() will always return 0, the operation will be delayed
> until the unlock, and mv88e6xxx_reg_unlock() does not return an error
> code (why would it?).

If the RMU fails and the fallback MDIO also fails, we are in big
trouble, and the switch is probably dead. At which point, do we really
care. netdev_ratelimited_err('Switch has died...') and keep going,
everything afterwards is probably going to go wrong as well.

I don't think there are any instances in the driver where we try to
recover from an MDIO write failure, other than return -ETIMEDOUT or
-EIO or whatever.

> Or the driver might have a worker which periodically sends the GetID
> message and tracks whether the switch responded. Maybe the rescheduling
> intervals of that are dynamically adjusted based on feedback from
> timeouts or successes of register reads/writes. In any case, now we're
> starting to talk about really complex logic. And it's not clear how
> effective any of these mechanisms would be against random and sporadic
> timeouts rather than persistent issues.

I don't think we can assume every switch has an equivalent of
GetID. So a solution like this would have to be per driver. Given the
potential complexity, it would probably be better to have it in the
core, everybody shares it, debugs it, and makes sure it works well.

      Andrew