lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 22 Sep 2022 19:27:55 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Mattias Forsblad <mattias.forsblad@...il.com>,
        netdev@...r.kernel.org, Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, linux@...linux.org.uk,
        ansuelsmth@...il.com
Subject: Re: [PATCH net-next v14 5/7] net: dsa: mv88e6xxx: rmu: Add
 functionality to get RMON

> > I was thinking within mv88e6xxx_read() and mv88e6xxx_write(). Keep a
> > buffer for building requests. Each write call appends the write to the
> > buffer and returns 0. A read call gets appended to the buffer and then
> > executes the RMU. We probably also need to wrap the reg mutex, so that
> > when it is released, any buffered writes get executed. If the RMU
> > fails, we have all the information needed to do the same via MDIO.
> 
> Ah, so you want to make the mv88e6xxx_reg_unlock() become an implicit
> write barrier.

I'm still thinking this through. It probably needs real code to get
all the details sorted out. The locking could be interesting...  We
need something to flush the queue before we exit from the driver, and
that seems like the obvious synchronisation point. If not that, we
need to add another function call at all the exit points.

> That could work, but the trouble seems to be error propagation.
> mv88e6xxx_write() will always return 0, the operation will be delayed
> until the unlock, and mv88e6xxx_reg_unlock() does not return an error
> code (why would it?).

If the RMU fails and the fallback MDIO also fails, we are in big
trouble, and the switch is probably dead. At which point, do we really
care. netdev_ratelimited_err('Switch has died...') and keep going,
everything afterwards is probably going to go wrong as well.

I don't think there are any instances in the driver where we try to
recover from an MDIO write failure, other than return -ETIMEDOUT or
-EIO or whatever.

> Or the driver might have a worker which periodically sends the GetID
> message and tracks whether the switch responded. Maybe the rescheduling
> intervals of that are dynamically adjusted based on feedback from
> timeouts or successes of register reads/writes. In any case, now we're
> starting to talk about really complex logic. And it's not clear how
> effective any of these mechanisms would be against random and sporadic
> timeouts rather than persistent issues.

I don't think we can assume every switch has an equivalent of
GetID. So a solution like this would have to be per driver. Given the
potential complexity, it would probably be better to have it in the
core, everybody shares it, debugs it, and makes sure it works well.

      Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ