lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <X/ycQpu7NIGI969v@gerhold.net>
Date:   Mon, 11 Jan 2021 19:43:14 +0100
From:   Stephan Gerhold <stephan@...hold.net>
To:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     linux-kernel@...r.kernel.org, Peter Chen <Peter.Chen@....com>,
        Kishon Vijay Abraham I <kishon@...com>,
        linux-usb@...r.kernel.org, Thierry Reding <treding@...dia.com>
Subject: Infinite recursion in device_reorder_to_tail() due to circular
 device links

Hi,

since 5.11-rc1 I get kernel crashes with infinite recursion in
device_reorder_to_tail() in some situations... It's a bit complicated to
explain so I want to apologize in advance for the long mail. :)

  Kernel panic - not syncing: kernel stack overflow
  CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
  Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
  Call trace:
   ...
   device_reorder_to_tail+0x4c/0xf0
   device_reorder_to_tail+0x98/0xf0
   device_reorder_to_tail+0x60/0xf0
   device_reorder_to_tail+0x60/0xf0
   device_reorder_to_tail+0x60/0xf0
   ...

The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
Reorder devices on successful probe"). It stops happening when I revert
this commit. But I don't think this commit is the actual problem...

It's easy to reproduce on any device based on the Qualcomm MSM8916 SoC
by adding a random regulator to the USB node, e.g.:

diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
index 3a9538e1ec97..9f43fce9e6e3 100644
--- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
@@ -372,6 +372,7 @@ codec {
 
 &usb {
 	status = "okay";
+	vbus-supply = <&pm8916_l5>;
 	extcon = <&usb_id>, <&usb_id>;
 
 	pinctrl-names = "default", "device";

I searched for problems in the regulator core but the problem actually
has nothing to do with regulators: The additional regulator supply just
delays probing of the USB driver long enough to trigger the issue.

Adding some debug output to device_reorder_to_tail() reveals that it
keeps recursing over the same 4 devices:

  msm_hsusb 78d9000.usb: device_reorder_to_tail()
  ci_hdrc ci_hdrc.0: device_reorder_to_tail()
  qcom_usb_hs_phy ci_hdrc.0.ulpi: device_reorder_to_tail()
  phy phy-ci_hdrc.0.ulpi.0: device_reorder_to_tail()
  msm_hsusb 78d9000.usb: device_reorder_to_tail()
  ...

The device hierarchy of these is (children devices):

  78d9000.usb -> ci_hdrc.0 -> ci_hdrc.0.ulpi -> phy-ci_hdrc.0.ulpi.0

ci_hdrc.0 calls phy_get(dev->parent, "usb-phy"). In phy_get(),
the phy-core then attempts to add the following device link:

  phy-ci_hdrc.0.ulpi.0 -> 78d9000.usb

The device link creation in phy-core is optional (see commit 1d7cb11e1090
("phy: core: Fix phy_get() to not return error on link creation failure"))
because this device link is circular in case of ULPI PHYs (like here).

And indeed, creating this device link usually fails (as it should).
However, adding the "vbus-supply" changes probe order in some way that
this circular device link ends up being created:
  /sys/class/devlink/phy-ci_hdrc.0.ulpi.0--78d9000.usb/ exists only when
I add the "vbus-supply" as in the diff above.

Apparently, there is a special situation where device_is_dependent()
does not work properly, and therefore the driver core allows creating
the circular device link.

To show the problem, I enabled some debug messages and added the
following log message:

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 25e08e5f40bd..ff1344eabb31 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3089,9 +3089,11 @@ int device_add(struct device *dev)
 	}
 
 	bus_probe_device(dev);
-	if (parent)
+	if (parent) {
+		dev_info(dev, "add to parent %s\n", dev_name(parent));
 		klist_add_tail(&dev->p->knode_parent,
 			       &parent->p->klist_children);
+	}
 
 	if (dev->class) {
 		mutex_lock(&dev->class->p->mutex);

Running this with "vbus-supply" (where it crashes) produces:

     bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
       <some probe deferrals while waiting for the regulator>
     bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
     bus: 'platform': probing driver ci_hdrc with device ci_hdrc.0
     bus: 'ulpi': probing driver qcom_usb_hs_phy with device ci_hdrc.0.ulpi
     phy phy-ci_hdrc.0.ulpi.0: add to parent ci_hdrc.0.ulpi
     qcom_usb_hs_phy ci_hdrc.0.ulpi: add to parent ci_hdrc.0
 (1) msm_hsusb 78d9000.usb: Linked as a consumer to phy-ci_hdrc.0.ulpi.0
 (2) ci_hdrc ci_hdrc.0: add to parent 78d9000.usb
     Kernel panic - not syncing: kernel stack overflow
      ...

Note how ci_hdrc is added to the children list of 78d9000.usb (2) after
the device link was already created in (1). This is why device_is_dependent()
does not realize the devices will eventually be dependent on each other.

Without "vbus-supply" (2) happens before (1) because then the PHY driver
requests probe deferral a few times while waiting for regulators.

I'm not really sure what to do here, any suggestions how to fix this properly?

(In case it helps, I have attached an example kernel module (circdl.c)
 that allows reproducing the exact same crash.)

Thanks in advance!
Stephan

View attachment "circdl.c" of type "text/plain" (1999 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ