lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 12 Sep 2014 10:08:48 -0600
From:	Stephen Warren <swarren@...dotorg.org>
To:	Chris Ball <chris@...ntf.net>, Ulf Hansson <ulf.hansson@...aro.org>
CC:	Russell King <linux@....linux.org.uk>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Alexandre Courbot <acourbot@...dia.com>,
	"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linus Walleij <linus.walleij@...aro.org>
Subject: Re: mmc: freeing host while host->detect work queue is still active

On 09/11/2014 04:03 PM, Stephen Warren wrote:
> Running Fedora rawhides's 3.17.0-0.rc4.git2.1.fc22.armv7hl kernel on
> Jetson TK1 (an ARM board containing Tegra SoC), I see the following
> during boot most times the Tegra SDHCI driver defers probe for the SD slot:

(and indeed I can reproduce the same issue in plain old v3.17-rc4, so 
this isn't at all Fedora specific)

>> [    8.377719] sdhci-tegra 700b0400.sdhci: Got CD GPIO #170.
>> [    8.377780] sdhci-tegra 700b0400.sdhci: Got WP GPIO #132.
>> [    8.377898] mmc1: Unknown controller version (3). You may
>> experience problems.
>> [    8.379796] sdhci-tegra 700b0400.sdhci: No vmmc regulator found
>> [    8.380225] ------------[ cut here ]------------
>> [    8.380243] WARNING: CPU: 2 PID: 6 at lib/debugobjects.c:263
>> debug_print_object+0x8c/0xb4()
>> [    8.380261] ODEBUG: free active (active state 0) object type:
>> timer_list hint: delayed_work_timer_fn+0x0/0x18
>> [    8.380308] Modules linked in: ehci_tegra(+) sdhci_tegra
>> sdhci_pltfm sdhci phy_tegra_usb mmc_core i2c_tegra tegra_drm
>> drm_kms_helper drm host1x
>> [    8.380319] CPU: 2 PID: 6 Comm: kworker/u8:0 Not tainted
>> 3.17.0-0.rc4.git2.1.fc22.armv7hl #1
>> [    8.380336] Workqueue: deferwq deferred_probe_work_func
>> [    8.380358] [<c0218ffc>] (unwind_backtrace) from [<c0212bf0>]
>> (show_stack+0x18/0x1c)
>> [    8.380374] [<c0212bf0>] (show_stack) from [<c094dbe0>]
>> (dump_stack+0x84/0xb0)
>> [    8.380395] [<c094dbe0>] (dump_stack) from [<c0251690>]
>> (warn_slowpath_common+0x70/0x94)
>> [    8.380413] [<c0251690>] (warn_slowpath_common) from [<c02516e8>]
>> (warn_slowpath_fmt+0x34/0x44)
>> [    8.380426] [<c02516e8>] (warn_slowpath_fmt) from [<c057ec6c>]
>> (debug_print_object+0x8c/0xb4)
>> [    8.380439] [<c057ec6c>] (debug_print_object) from [<c057f8d8>]
>> (debug_check_no_obj_freed+0xf4/0x1dc)
>> [    8.380453] [<c057f8d8>] (debug_check_no_obj_freed) from
>> [<c0398174>] (kfree+0x160/0x308)
>> [    8.380465] [<c0398174>] (kfree) from [<c0658094>]
>> (device_release+0x64/0x98)
>> [    8.380479] [<c0658094>] (device_release) from [<c056a8a4>]
>> (kobject_release+0x12c/0x19c)
>> [    8.380498] [<c056a8a4>] (kobject_release) from [<bf2994b0>]
>> (sdhci_tegra_probe+0x190/0x1bc [sdhci_tegra])
>> [    8.380810] [<bf2994b0>] (sdhci_tegra_probe [sdhci_tegra]) from
>> [<c065e024>] (platform_drv_probe+0x34/0x68)
>> [    8.380825] [<c065e024>] (platform_drv_probe) from [<c065c0bc>]
>> (driver_probe_device+0x13c/0x33c)
>> [    8.380838] [<c065c0bc>] (driver_probe_device) from [<c065a5f8>]
>> (bus_for_each_drv+0x8c/0x9c)
>> [    8.380852] [<c065a5f8>] (bus_for_each_drv) from [<c065bf08>]
>> (device_attach+0x70/0x94)
>> [    8.380865] [<c065bf08>] (device_attach) from [<c065b464>]
>> (bus_probe_device+0x30/0xa4)
>> [    8.380878] [<c065b464>] (bus_probe_device) from [<c065b954>]
>> (deferred_probe_work_func+0x88/0xb8)
>> [    8.380894] [<c065b954>] (deferred_probe_work_func) from
>> [<c026b0e8>] (process_one_work+0x2a0/0x5f8)
>> [    8.380908] [<c026b0e8>] (process_one_work) from [<c026c7e4>]
>> (worker_thread+0x2d0/0x40c)
>> [    8.380922] [<c026c7e4>] (worker_thread) from [<c027153c>]
>> (kthread+0xd4/0xe8)
>> [    8.380936] [<c027153c>] (kthread) from [<c020ecc8>]
>> (ret_from_fork+0x14/0x20)
>> [    8.380941] ---[ end trace b1f4b0fe632eb3a4 ]---
>
> The problem is as follows:
>
> The following sequence of calls requests the CD (Change Detect) IRQ for
> the SD slot:
>
> sdhci_tegra_probe()
> sdhci_tegra_parse_dt()
> mmc_of_parse()
> mmc_gpio_request_cd()
> devm_request_threaded_irq()
>
> The IRQ is triggered by a pin on the SD slot, so could be in any state,
> and might fire immediately. This causes the IRQ handler
> mmc_gpio_cd_irqt() to run, which calls mmc_detect_change() which calls
> mmc_schedule_delayed_work(&host->detect, ...); Thus, the work queue can
> be immediately active.
>
> However, if later part of sdhci_tegra_probe() fails, e.g.
> sdhci_add_host() -> mmc_regulator_get_supply() ->
> devm_regulator_get_optional() -> -EPROBE_DEFER, then sdhci_tegra_probe()
> needs to tear everything down, and so calls sdhci_pltfm_free() ->
> sdhci_free_host() -> mmc_free_host() -> put_device(&host->class_dev) ->
> mmc_host_classdev_release() -> kfree(host). However, host->detect is
> part of host and may still be active at this point, which is what
> triggers the ODETECT log spew mentioned above.
>
> This doesn't happen in linux-next, because mmc_gpiod_request_cd()
> doesn't set up the IRQ handler. Rather this happens inside
> mmc_gpiod_request_cd_irq() which is called from mmc_start_host(), which
> I believe is well after deferred probe could occur.
>
> Some possible options for fixing this in 3.17 are:
>
> a) Back-port whatever changes in mainline stopped mmc_of_parse()() from
> requesting the IRQ. That is commit 740a221ef0e5 "mmc: slot-gpio: Add
> GPIO descriptor based CD GPIO API". I haven't investigated how many
> other commits would be required for that commit to apply, but I'm
> nervous it'd be too many to apply to 3.17.

Sorry, I quoted the wrong commit ID there. That commit is already 
present in 3.17 (and some earlier). That commit adds the separate 
mmc_gpiod_request_cd_irq() function that's necessary to solve the 
problem, and adds a call to it from mmc_start_host(). However, it still 
leaves a call to mmc_gpiod_request_cd_irq() in mmc_gpio_request_cd(), 
thus leaving the problem in place.

The following commit from linux-next actually solves the problem for me, 
by having mmc_of_parse() call mmc_gpiod_request_cd() [which doesn't call 
mmc_gpiod_request_cd_irq()] rather than mmc_gpio_request_cd():

98e90de99a0c mmc: host: switch OF parser to use gpio descriptors

That commit relies on at least one of the following two commits in order 
to successfully build directly on top of 3.17-rc4:

9d2fa2428ae1 mmc: slot-gpio: add gpiod variant to get wp GPIO
9fbc695075e9 mmc: slot-gpio: switch to use flags when getting GPIO

Is that too much to apply for 3.17 too?

However I note that this only solves the problem for any user of 
mmc_of_parse(); any code still calling mmc_gpio_request_cd() directly 
could still be exposed to the issue. I wonder if we shouldn't also 
remove the call to mmc_gpiod_request_cd_irq() from mmc_gpio_request_cd().

So...

> b) Fix the -EPROBE_DEFER cleanup path to explicitly free the IRQ (so it
> can't schedule the work queue any more), and then cancel the work queue
> (so it isn't active and doesn't trigger the ODETECT message).
>
> A naive patch might be like:
>
> diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
> index 31969436d77c..8125f916be4a 100644
> --- a/drivers/mmc/core/host.c
> +++ b/drivers/mmc/core/host.c
> @@ -586,6 +586,9 @@ void mmc_free_host(struct mmc_host *host)
>       idr_remove(&mmc_host_idr, host->index);
>       spin_unlock(&mmc_host_lock);
>
> +    mmc_gpiod_free_cd(host);
> +    cancel_delayed_work_sync(&host->detect);
> +
>       put_device(&host->class_dev);
>   }

... something like that (but calling a new mmc_gpiod_free_cd_irq()) 
might still be the best approach. I guess I should just send a patch to 
do that?

> However, that's problematic, since I think mmc_free_host() would have to
> somehow determine whether it needs to call mmc_gpio_free_cd() or
> mmc_gpiod_free_cd()? Perhaps we should *just* call devm_free_irq()
> there, rather than tearing down all the GPIO stuff too, i.e. split out a
> separate mmc_gpio_free_cd_irq() function rather like there's a separate
> IRQ request function?
>
> What approach do people prefer and/or does anyone have any other ideas
> for a fix?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ