lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <80ab086d-3e6c-4a69-a7ca-82acaf37ad25@lunn.ch>
Date: Sat, 8 Jul 2023 23:31:06 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Linus Walleij <linus.walleij@...aro.org>
Cc: Vivien Didelot <vivien.didelot@...il.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	Vladimir Oltean <olteanv@...il.com>,
	"David S . Miller" <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
	Tobias Waldekranz <tobias@...dekranz.com>
Subject: Re: [PATCH net] dsa: mv88e6xxx: Do a few more tries before timing out

On Sat, Jul 08, 2023 at 11:20:30PM +0200, Linus Walleij wrote:
> I get sporadic timeouts from the driver when using the
> MV88E6352. Increasing the timeout rounds solves the problem.
> Some added prints show things like this:
> 
> [   58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
>     for switch, addr 1b reg 0b, mask 8000, val 0000, data c000
> [   58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for
>     ATU op 4000, fid 0001
> (...)
> [   61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
>     for switch, addr 1c reg 18, mask 8000, val 0000, data 9860
> [   61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting
>     for PHY command 1860 to complete
> 
> The reason is probably not the commands: I think those are
> mostly fine with the 50+50ms timeout, but the problem
> appears when OpenWrt brings up several interfaces in
> parallel on a system with 7 populated ports: if one of
> them take more than 50 ms and waits one or more of the
> others can get stuck on the mutex for the switch and then
> this can easily multiply.

This is one of the classic bugs i keep an eye out for, and point
developers to iopoll.h to avoid it.

As you say, sleep() or a mutex can take a lot longer than expected, so
the loop exits with ETIMEDOUT, but in fact the operation is
successful, but not noticed.

The correct fix for this is after the loop, there should be one more
read of the register and a test on the condition. Only if that fails
then return -ETIMEDOUT.

	Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ