lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 22 Aug 2022 12:00:28 -0400
From:   Sean Anderson <sean.anderson@...o.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Heiner Kallweit <hkallweit1@...il.com>, netdev@...r.kernel.org,
        Paolo Abeni <pabeni@...hat.com>,
        "David S . Miller" <davem@...emloft.net>,
        linux-kernel@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
        Vladimir Oltean <olteanv@...il.com>
Subject: Re: [PATCH net] net: phy: Warn if phy is attached when removing



On 8/19/22 8:20 PM, Russell King (Oracle) wrote:
> On Fri, Aug 19, 2022 at 04:45:19PM -0700, Jakub Kicinski wrote:
>> On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote:
>> > netdevs using phylib can be oopsed from userspace in the following
>> > manner:
>> > 
>> > $ ip link set $iface up
>> > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
>> >       /sys/class/net/$iface/phydev/driver/unbind
>> > $ ip link set $iface down
>> > 
>> > However, the traceback provided is a bit too late, since it does not
>> > capture the root of the problem (unbinding the driver). It's also
>> > possible that the memory has been reallocated if sufficient time passes
>> > between when the phy is detached and when the netdev touches the phy
>> > (which could result in silent memory corruption). Add a warning at the
>> > source of the problem. A future patch could make this more robust by
>> > calling dev_close.
>> 
>> FWIW, I think DaveM marked this patch as changes requested.
>> 
>> I don't really know enough to have an opinion.
>> 
>> PHY maintainers, anyone willing to cast a vote?
> 
> I don't think Linus is a fan of using WARN_ON() as an assert()
> replacement, which this feels very much like that kind of thing.
> I don't see much point in using WARN_ON() here as we'll soon get
> a kernel oops anyway, and the backtrace WARN_ON() will produce isn't
> useful - it'll be just noise.
> 
> So, I'd tone it down to something like:
> 
> 	if (phydev->attached_dev)
> 		phydev_err(phydev, "Removing in-use PHY, expect an oops");

That's fine by me

> Maybe even introduce phydev_crit() just for this message.
> 
> Since we have bazillions of network drivers that hold a reference to
> the phydev, I don't think we can do much better than this for phylib.
> It would be a massive effort to go through all the network drivers
> and try to work out how to fix them.

In the last thread I posted this snippet:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index a74b320f5b27..05894e1c3e59 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -27,6 +27,7 @@
 #include <linux/phy.h>
 #include <linux/phy_led_triggers.h>
 #include <linux/property.h>
+#include <linux/rtnetlink.h>
 #include <linux/sfp.h>
 #include <linux/skbuff.h>
 #include <linux/slab.h>
@@ -3111,6 +3112,13 @@ static int phy_remove(struct device *dev)
 {
        struct phy_device *phydev = to_phy_device(dev);
 
+	// I'm pretty sure this races with multiple unbinds...
+       rtnl_lock();
+       device_unlock(dev);
+       dev_close(phydev->attached_dev);
+       device_lock(dev);
+       rtnl_unlock();
+       WARN_ON(phydev->attached_dev);
+
        cancel_delayed_work_sync(&phydev->state_queue);
 
        mutex_lock(&phydev->lock);
---

Would this be acceptable? Can the locking be fixed?

> Addressing the PCS part of the patch posting and unrelated to what we
> do for phylib...
> 
> However, I don't see "we'll do this for phylib, so we can do it for
> PCS as well" as much of a sane argument - we don't have bazillions
> of network drivers to fix to make this work for PCS. We currently
> have no removable PCS (they don't appear with a driver so aren't
> even bound to anything.) So we should add the appropriate handling
> when we start to do this.
> 
> Phylink has the capability to take the link down when something goes
> away, and if the PCS goes away, that's exactly what should happen,
> rather than oopsing the kernel.

Yeah, but we can't just call phylink_stop; we have to call the function
which will call phylink_stop, which is different for MAC drivers and
for DSA drivers. I think we'd need something like

struct phylink_pcs *pcs_get(struct device *dev, const char *id,
			    void (*detach)(struct phylink_pcs *, void *),
			    void *priv)

which would also require that pcs_get is called before phylink_start,
instead of in probe (which is what every existing user does).

> As MAC drivers hold a reference to the PCS instances, as they need to
> select the appropriate one, how do MAC drivers get to know that the
> PCS has gone away to drop their reference - and tell phylink that the
> PCS has gone. That's the problem that needs solving to allow PCS to
> be unbound if we're going to make them first class citizens of the
> driver model.
> 
> I am no fan of "but XYZ doesn't care about it, so why should we care"
> arguments. If I remember correctly, phylib pre-dates the device model,
> and had the device model retrofitted, so it was a best-efforts
> attempt - and suffered back then with the same problem of needing
> lots of drivers to be changed in non-trivial ways.
> 
> We have the chance here to come up with something better - and I think
> that chance should be used to full effect.

---

This whole thing has me asking the question: why do we allow unbinding
in-use devices in the first place?

--Sean

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ