lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 16 Aug 2023 21:09:40 +0300
From: Serge Semin <fancer.lancer@...il.com>
To: Andrew Lunn <andrew@...n.ch>,
	Heiner Kallweit <hkallweit1@...il.com>,
	Russell King <linux@...linux.org.uk>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>,
	Francesco Dolcini <francesco.dolcini@...adex.com>
Cc: Serge Semin <fancer.lancer@...il.com>,
	netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [RFC net] Revert "net: phy: Fix race condition on link status change"

Protecting the phy_driver.drv->handle_interrupt() callback invocation by
the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock
the mutex twice thus deadlocking on the next calls thread:
IRQ: phy_interrupt()
     +-> mutex_lock(&phydev->lock); <-------------+
         drv->handle_interrupt()                  | Deadlock due to the
         +-> phy_error()                          + nested PHY-device
             +-> phy_process_error()              | mutex lock
                 +-> mutex_lock(&phydev->lock); <-+
                     phydev->state = PHY_ERROR;
                     mutex_unlock(&phydev->lock);
         mutex_unlock(&phydev->lock);

The problem can be easily reproduced just by calling phy_error() from the
any PHY-device interrupt handler. Reverting the commit 91a7cda1f4b8 ("net:
phy: Fix race condition on link status change") fixes the deadlock.

This reverts commit 91a7cda1f4b8bdf770000a3b60640576dafe0cec.

Fixes: 91a7cda1f4b8 ("net: phy: Fix race condition on link status change")
Signed-off-by: Serge Semin <fancer.lancer@...il.com>

---

Since obviously it would be better to fix both the deadlock and the
problem described in the blamed commit the patch is marked as RFC. I am
not aware of a better solution for now than to revert the commit caused
the regression. So let's discuss to find out whether it's possible to have
a better fix here.

---
 drivers/net/phy/phy.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index bdf00b2b2c1d..9483bd57158e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1235,7 +1235,6 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 {
 	struct phy_device *phydev = phy_dat;
 	struct phy_driver *drv = phydev->drv;
-	irqreturn_t ret;
 
 	/* Wakeup interrupts may occur during a system sleep transition.
 	 * Postpone handling until the PHY has resumed.
@@ -1259,11 +1258,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 		return IRQ_HANDLED;
 	}
 
-	mutex_lock(&phydev->lock);
-	ret = drv->handle_interrupt(phydev);
-	mutex_unlock(&phydev->lock);
-
-	return ret;
+	return drv->handle_interrupt(phydev);
 }
 
 /**
-- 
2.41.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ