linux-kernel - BUG in mmc: core: Disable card detect during shutdown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <55A0788B-03E8-457E-B093-40FD93F1B9F3@goldelico.com>
Date:   Mon, 30 May 2022 18:55:46 +0200
From:   "H. Nikolaus Schaller" <hns@...delico.com>
To:     Ulf Hansson <ulf.hansson@...aro.org>
Cc:     Discussions about the Letux Kernel <letux-kernel@...nphoenux.org>,
        kernel@...a-handheld.com, aTc <atc@...-p.org>,
        Tony Lindgren <tony@...mide.com>,
        linux-omap <linux-omap@...r.kernel.org>,
        linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mmc@...r.kernel.org
Subject: BUG in mmc: core: Disable card detect during shutdown

Hi Ulf,
users did report a strange issue that the OMAP5 based Pyra does not
shutdown if a kernel 5.10.116 is used.

Someone did a bisect and found that reverting

0d66b395210c5084c2b7324945062c1d1f95487a

resp. upstream

66c915d09b942fb3b2b0cb2f56562180901fba17

solves it.

I could now confirm that it also happens with v5.18.0.
But interestingly only on the Pyra handheld device and not
on the omap5evm (which is supported by mainline).

The symptom is:

a) without revert

root@...ux:~# poweroff

Broadcast message from root@...ux (console) (Sat Jan  1 01:08:25 2000):

The system is going down for system halt NOW!
INIT: Sending processes the TERM signal
root@...ux:~# [info] Using makefile-style concurrent boot in runlevel 0.
[....] Stopping cgroup management proxy daemon: cgproxy[....] Stopping cgroup management daemon: cgmanager[....] Stop[ ok bluetooth: /usr/sbin/bluetoothd.
[FAIL] Stopping ISC DHCP server: dhcpd failed!
dhcpcd[3055]: sending signal 15 to pid 2976
dhcpcd[3055]: waiting for pid 2976 to exit
[ ok ] Shutting down ALSA...done.
[ ok ] Asking all remaining processes to terminate...done.
[ ok ] All processes ended within 2 seconds...done.
[ ok [[c[....] Stopping enhanced syslogd: rsyslogd.
[ ok ....] Deconfiguring network interfaces...done.
^[[c[info] Saving the system clock.
[info] Hardware Clock updated to Sat Jan  1 01:08:30 UTC 2000.
[ ok ] Deactivating swap...done.
^[[c[   77.289332] EXT4-fs (mmcblk0p2): re-mounted. Quota mode: none.
[info] Will now halt.

b) with reverting your patch

root@...ux:~# uname -a
Linux letux 5.18.0-letux-lpae+ #9678 SMP PREEMPT Mon May 30 18:02:28 CEST 2022 armv7l GNU/Linux
root@...ux:~# poweroff

Broadcast message from root@...ux (console) (Sat Jan  1 01:39:15 2000):

The system is going down for system halt NOW!
INIT: Sending processes the TERM signal
root@...ux:~# [info] Using makefile-style concurrent boot in runlevel 0.
[FAIL] Stopping cgroup management proxy daemon: cgproxy[....] Stopping ISC DHCP server: dhcpd failed!
[....] Stopping cgroup management daemon: cgmanagerdhcpcd[3100]: sending signal 15 to pid 3013
dhcpcd[3100]: waiting for pid 3013 to exit
[ ok ] Stopping bluetooth: /usr/sbin/bluetoothd.
[ ok ] Shutting down ALSA...done.
[ ok ] Asking all remaining processes to terminate...done.
[ ok ] All processes ended within 3 seconds...done.
[ ok [[c[....] Stopping enhanced syslogd: rsyslogd.
[ ok ....] Deconfiguring network interfaces...done.
^[[c[info] Saving the system clock.
[info] Hardware Clock updated to Sat Jan  1 01:39:21 UTC 2000.
[ ok ] Deactivating swap...done.
^[[c[   44.563256] EXT4-fs (mmcblk0p2): re-mounted. Quota mode: none.
[info] Will now halt.
[   46.917534] reboot: Power down


What I suspect is that we have multiple mmc interfaces and have
card detect wired up in the Pyra while it is ignored in the
EVM. Is it possible that __mmc_stop_host() never returns in
.shutdown_pre if card detect is set up (and potentially
shut down earlier)?

Setup of mmc is done in omap5-board-common.dtsi and omap5.dtsi.

Out Pyra has a non-upstream device tree where we use
omap5-board-common.dtsi and overwrite it by e.g.

&mmc4 { /* second (u)SD slot (SDIO capable) */
	status = "okay";
	vmmc-supply = <&ldo2_reg>;
	pinctrl-names = "default";
	pinctrl-0 = <&mmc4_pins>;
	bus-width = <4>;
	cd-gpios = <&gpio3 13 GPIO_ACTIVE_LOW>;	/* gpio3_77 */
	wp-gpios = <&gpio3 15 GPIO_ACTIVE_HIGH>;	/* gpio3_79 */
};

But I have tried to remove the cd-gpois and wp-gpois. Or make the
mmc interface being disabled (but I may not have catched everything
in first place).

Then I added some printk to mmc_stop_host() and __mmc_stop_host().

mmc_stop_host() is not called but __mmc_stop_host() is called 4 times.
There are 4 active MMC interfaces in the Pyra - 3 for (µ)SD slots
and one for an SDIO WLAN module.

Now it looks as if 3 of them are properly teared down (two of them
seem to have host->slot.cd_irq >= 0) but on the fourth call
cancel_delayed_work_sync(&host->detect); does not return. This is
likely the location of the stall why we don't see a "reboot: Power down"

Any ideas?

BR and thanks,
Nikolaus

printk hack:

void __mmc_stop_host(struct mmc_host *host)
{
printk("%s 1\n", __func__);
	if (host->slot.cd_irq >= 0) {
printk("%s 2\n", __func__);
		mmc_gpio_set_cd_wake(host, false);
printk("%s 3\n", __func__);
		disable_irq(host->slot.cd_irq);
printk("%s 4\n", __func__);
	}

	host->rescan_disable = 1;
printk("%s 5\n", __func__);
	cancel_delayed_work_sync(&host->detect);
printk("%s 6\n", __func__);
}

resulting log:

[info] Will now halt.
[  282.780929] __mmc_stop_host 1
[  282.784276] __mmc_stop_host 2
[  282.787735] __mmc_stop_host 3
[  282.791030] __mmc_stop_host 4
[  282.794235] __mmc_stop_host 5
[  282.797369] __mmc_stop_host 6
[  282.800918] __mmc_stop_host 1
[  282.804269] __mmc_stop_host 5
[  282.807541] __mmc_stop_host 6
[  282.810715] __mmc_stop_host 1
[  282.813842] __mmc_stop_host 2
[  282.816984] __mmc_stop_host 3
[  282.820175] __mmc_stop_host 4
[  282.823302] __mmc_stop_host 5
[  282.826449] __mmc_stop_host 6
[  282.830941] __mmc_stop_host 1
[  282.834076] __mmc_stop_host 5

--- here should be another __mmc_stop_host 6
--- and reboot: Power down