lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 13 Feb 2021 14:33:31 +0100
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Hans de Goede <hdegoede@...hat.com>
Cc:     Matti Vaittinen <matti.vaittinen@...rohmeurope.com>,
        mazziesaccount@...il.com, "Rafael J. Wysocki" <rafael@...nel.org>,
        MyungJoo Ham <myungjoo.ham@...sung.com>,
        Chanwoo Choi <cw00.choi@...sung.com>,
        Andy Gross <agross@...nel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Jean Delvare <jdelvare@...e.com>,
        Guenter Roeck <linux@...ck-us.net>,
        Mark Gross <mgross@...ux.intel.com>,
        Sebastian Reichel <sre@...nel.org>,
        Chen-Yu Tsai <wens@...e.org>,
        Liam Girdwood <lgirdwood@...il.com>,
        Mark Brown <broonie@...nel.org>,
        Wim Van Sebroeck <wim@...ux-watchdog.org>,
        Saravana Kannan <saravanak@...gle.com>,
        Heikki Krogerus <heikki.krogerus@...ux.intel.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Joerg Roedel <jroedel@...e.de>,
        Dan Williams <dan.j.williams@...el.com>,
        Bartosz Golaszewski <bgolaszewski@...libre.com>,
        linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
        linux-hwmon@...r.kernel.org, platform-driver-x86@...r.kernel.org,
        linux-pm@...r.kernel.org, linux-watchdog@...r.kernel.org
Subject: Re: [RFC PATCH 1/7] drivers: base: Add resource managed version of
 delayed work init

On Sat, Feb 13, 2021 at 02:18:06PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 2/13/21 1:16 PM, Greg Kroah-Hartman wrote:
> > On Sat, Feb 13, 2021 at 01:58:44PM +0200, Matti Vaittinen wrote:
> >> A few drivers which need a delayed work-queue must cancel work at exit.
> >> Some of those implement remove solely for this purpose. Help drivers
> >> to avoid unnecessary remove and error-branch implementation by adding
> >> managed verision of delayed work initialization
> >>
> >> Signed-off-by: Matti Vaittinen <matti.vaittinen@...rohmeurope.com>
> > 
> > That's not a good idea.  As this would kick in when the device is
> > removed from the system, not when it is unbound from the driver, right?
> 
> Erm, no devm managed resources get released when the driver is detached:
> drivers/base/dd.c: __device_release_driver() calls devres_release_all(dev);

Then why do you have to manually call devm_free_irq() in release
callbacks?  I thought that was the primary problem with those things.

I can understand devm_ calls handling resources, but callbacks and
workqueues feels like a big stretch.

> > There is two different lifespans here (well 3).  Code and data*2.  Don't
> > confuse them as that will just cause lots of problems.
> > 
> > The move toward more and more "devm" functions is not the way to go as
> > they just more and more make things easier to get wrong.
> > 
> > APIs should be impossible to get wrong, this one is going to be almost
> > impossible to get right.
> 
> I have to disagree here devm generally makes it easier to get things right,
> it is when some devm functions are missing and devm and non devm resources
> are mixed that things get tricky.
> 
> Lets look for example at the drivers/extcon/extcon-intel-int3496.c code
> from patch 2/7 from this set. The removed driver-remove function looks like
> this:
> 
> -static int int3496_remove(struct platform_device *pdev)
> -{
> -	struct int3496_data *data = platform_get_drvdata(pdev);
> -
> -	devm_free_irq(&pdev->dev, data->usb_id_irq, data);
> -	cancel_delayed_work_sync(&data->work);
> -
> -	return 0;
> -}
> -
> 
> This is a good example where the mix of devm and non devm (the workqueue)
> resources makes things tricky. The IRQ must be freed first to avoid the
> work potentially getting re-queued after the sync cancel.
> 
> In this case using devm for the IRQ may cause the driver author to forget
> about this, leaving a race.
> 
> Bit with the new proposed devm_delayed_work_autocancel() function things
> will just work.
> 
> This work gets queued by the IRQ handler, so the work must be initialized (1)
> *before* devm_request_irq() gets called. Any different order would be a
> bug in the probe function since then the IRQ might run before the work
> is initialized.

How are we now going to audit the order of these calls to ensure that
this is done correctly?  That still feels like it is ripe for bugs in a
much easier way than without these functions.

> Since devm unrolls / releases resources in reverse order, this means that
> it will automatically free the IRQ (which was requested later) before
> cancelling the work.
> 
> So by switching to the new devm_delayed_work_autocancel() function we avoid
> a case where a driver author can cause a race on driver detach because it is
> relying on devm to free the IRQ, which may cause it to requeue a just
> cancelled work.
> 
> IOW introducing this function (and using it where appropriate) actually
> removes a possible class of bugs.
> 
> patch 2/7 actually has a nice example of this, drivers/extcon/extcon-gpio.c
> also uses a delayed work queued by an interrupt, together with devm managing
> the interrupt, yet the removed driver_remove callback:
> 
> -static int gpio_extcon_remove(struct platform_device *pdev)
> -{
> -	struct gpio_extcon_data *data = platform_get_drvdata(pdev);
> -
> -	cancel_delayed_work_sync(&data->work);
> -
> -	return 0;
> -}
> -
> 
> Is missing the explicit free on the IRQ which is necessary to avoid
> the race. One the one hand this illustrates your (Greg's) argument that
> devm managed IRQs may be a bad idea.

I still think it is :)

> OTOH it shows that if we have devm managed IRQs anyways that then also
> having devm managed autocancel works is a good idea, since this RFC patch-set
> not only results in some cleanup, but is actually fixing at least 1 driver
> detach race condition.

Fixing bugs is good, but the abstraction away from resource management
that the devm_ calls cause is worrying as the "magic" behind them can be
wrong, as seen here.

thanks,

greg k-h

Powered by blists - more mailing lists