lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220427151053.GA10204@wunner.de>
Date:   Wed, 27 Apr 2022 17:10:53 +0200
From:   Lukas Wunner <lukas@...ner.de>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     Steve Glendinning <steve.glendinning@...well.net>,
        UNGLinuxDriver@...rochip.com, Oliver Neukum <oneukum@...e.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        linux-usb@...r.kernel.org, Andre Edich <andre.edich@...rochip.com>,
        Oleksij Rempel <o.rempel@...gutronix.de>,
        Martyn Welch <martyn.welch@...labora.com>,
        Gabriel Hojda <ghojda@...urs.ro>,
        Christoph Fritz <chf.fritz@...glemail.com>,
        Lino Sanfilippo <LinoSanfilippo@....de>,
        Philipp Rosenberger <p.rosenberger@...bus.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Russell King <linux@...linux.org.uk>
Subject: Re: [PATCH net] usbnet: smsc95xx: Fix deadlock on runtime resume

On Wed, Apr 27, 2022 at 10:00:08AM -0400, Alan Stern wrote:
> On Wed, Apr 27, 2022 at 08:41:49AM +0200, Lukas Wunner wrote:
> > Commit 05b35e7eb9a1 ("smsc95xx: add phylib support") amended
> > smsc95xx_resume() to call phy_init_hw().  That function waits for the
> > device to runtime resume even though it is placed in the runtime resume
> > path, causing a deadlock.
> > 
> > The problem is that phy_init_hw() calls down to smsc95xx_mdiobus_read(),
> > which never uses the _nopm variant of usbnet_read_cmd().  Amend it to
> > autosense that it's called from the runtime resume/suspend path and use
> > the _nopm variant if so.
[...]
> > --- a/drivers/net/usb/smsc95xx.c
> > +++ b/drivers/net/usb/smsc95xx.c
> > @@ -285,11 +285,21 @@ static void smsc95xx_mdio_write_nopm(struct usbnet *dev, int idx, int regval)
> >  	__smsc95xx_mdio_write(dev, pdata->phydev->mdio.addr, idx, regval, 1);
> >  }
> >  
> > +static bool smsc95xx_in_pm(struct usbnet *dev)
> > +{
> > +#ifdef CONFIG_PM
> > +	return dev->udev->dev.power.runtime_status == RPM_RESUMING ||
> > +	       dev->udev->dev.power.runtime_status == RPM_SUSPENDING;
> > +#else
> > +	return false;
> > +#endif
> > +}
> 
> This does not do what you want.  You want to know if this function is 
> being called in the resume pathway, but all it really tells you is 
> whether the function is being called while a resume is in progress (and 
> it doesn't even do that very precisely because the code does not use the 
> runtime-pm spinlock).  The resume could be running in a different 
> thread, in which case you most definitely _would_ want to want for it to 
> complete.

I'm aware of that.  I've explored various approaches and none solved
the problem perfectly.  This one seems good enough for all practical
purposes.

One approach I've considered is to use current_work() to determine if
we're called from dev->power.work.  But that only works if the runtime
resume/suspend is asynchronous (RPM_ASYNC is set).  In this case, the
runtime resume is synchronous and called from a different work item
(hub_event).  So the approach is not feasible.

Another approach is to assign a dev_pm_domain to the usb_device, whose
->runtime_resume hook first calls usb_runtime_resume() (so that the
usb_device and usb_interface has status RPM_ACTIVE), *then* calls
phy_init_hw().  Problem is, this only works for runtime resume
and we need a solution for runtime suspend as well.  (The device already
has status RPM_SUSPENDING when the dev_pm_domain's ->runtime_suspend hook
is invoked.)  So not a feasible approach either.

Fudging the runtime_status in the ->runtime_suspend and ->runtime_resume
hooks via pm_runtime_set_active() / _set_suspended() is rejected by the
runtime PM core.

I've even considered walking up the callstack via _RET_IP_ to determine
if one of the callers is smsc95xx_resume() / _suspend().  But I'm not
sure that's reliable and portable across all arches.

And I don't want to clutter phylib with _nopm variants either.

So the approach I've chosen here, while not perfect, does its job,
is simple and uses very little code.  If you've got a better idea,
please let me know.

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ