[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.1703101537280.2161-100000@iolanthe.rowland.org>
Date: Fri, 10 Mar 2017 16:14:02 -0500 (EST)
From: Alan Stern <stern@...land.harvard.edu>
To: Guenter Roeck <linux@...ck-us.net>
cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Douglas Anderson <dianders@...omium.org>,
Brian Norris <briannorris@...omium.org>,
<linux-usb@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] usb: hub: Fix error loop seen after hub communication
errors
On Wed, 8 Mar 2017, Guenter Roeck wrote:
> While stress testing a usb controller using a bind/unbind looop, the
> following error loop was observed.
>
> usb 7-1.2: new low-speed USB device number 3 using xhci-hcd
> usb 7-1.2: hub failed to enable device, error -108
> usb 7-1-port2: cannot disable (err = -22)
> usb 7-1-port2: couldn't allocate usb_device
> usb 7-1-port2: cannot disable (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
...
> This continues forever. After adding tracebacks into the code,
> the call sequence leading to this is found to be as follows.
>
> [<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8
> [<ffffffc0007fceb4>] hub_resume+0x2c/0x3c
> [<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158
> [<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288
> [<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98
> [<ffffffc0007820a0>] __rpm_callback+0x48/0x7c
> [<ffffffc00078217c>] rpm_callback+0xa8/0xd4
> [<ffffffc000786234>] rpm_suspend+0x84/0x758
> [<ffffffc000786ca4>] rpm_idle+0x2c8/0x498
> [<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac
> [<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c
> [<ffffffc000803798>] hub_event+0x10ac/0x12ac
> [<ffffffc000249bb8>] process_one_work+0x390/0x6b8
> [<ffffffc00024abcc>] worker_thread+0x480/0x610
> [<ffffffc000251a80>] kthread+0x164/0x178
> [<ffffffc0002045d0>] ret_from_fork+0x10/0x40
>
> kick_hub_wq() is called from hub_activate() even after failures to
> communicate with the hub. This results in an endless sequence of
> hub event -> hub activate -> wq trigger -> hub event -> ...
> To break the loop, only trigger the hub event queue if communication
> with the hub is successful.
>
> Signed-off-by: Guenter Roeck <linux@...ck-us.net>
> ---
> drivers/usb/core/hub.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 5286bf67869a..f22ab428b310 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -1197,7 +1197,8 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
> &hub->leds, LED_CYCLE_PERIOD);
>
> /* Scan all ports that need attention */
> - kick_hub_wq(hub);
> + if (!status)
> + kick_hub_wq(hub);
>
> if (type == HUB_INIT2 || type == HUB_INIT3) {
> /* Allow autosuspend if it was suppressed */
I'm not at all sure this is the best solution. status == 0 means that
an URB was successfully _submitted_; it does not mean that future
communication with the hub will be successful.
The problem arises here because the system is unable to runtime-suspend
the hub (you didn't say why, but probably because of the same
communication problem). When a suspend fails, drivers are informed
that the device is back at full power, and of course the hub driver
tries to determine the hub's current state at that time. When this
fails, there's no reason to keep the hub at full power any longer, so
the system tries to do another runtime suspend. This suspend fails
just like the first one, and there's your loop.
This same kind of thing could happen with any driver. In your case,
the problem could be solved by making sure that when the hub fails to
suspend, it is still at least minimally functional. We don't do this
now; when usb_port_suspend fails there is no checking to see whether
the device is still working.
(It would be interesting to know exactly at which point
usb_port_suspend fails during your test. And why didn't your system
mark the hub as disconnected when the host controller was unbound?)
An even worse problem is what to do when the device is working okay but
runtime suspend still fails (for example, if remote wakeup cannot be
enabled). How do we prevent the system from going into a similar loop
then?
Alan Stern
Powered by blists - more mailing lists