lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4eaddfc6-92f0-d777-e58b-9db2db618823@roeck-us.net>
Date:   Fri, 10 Mar 2017 15:37:51 -0800
From:   Guenter Roeck <linux@...ck-us.net>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Douglas Anderson <dianders@...omium.org>,
        Brian Norris <briannorris@...omium.org>,
        linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] usb: hub: Fix error loop seen after hub communication
 errors

Hi Alan,

On 03/10/2017 01:14 PM, Alan Stern wrote:
> On Wed, 8 Mar 2017, Guenter Roeck wrote:
>
>> While stress testing a usb controller using a bind/unbind looop, the
>> following error loop was observed.
>>
>> usb 7-1.2: new low-speed USB device number 3 using xhci-hcd
>> usb 7-1.2: hub failed to enable device, error -108
>> usb 7-1-port2: cannot disable (err = -22)
>> usb 7-1-port2: couldn't allocate usb_device
>> usb 7-1-port2: cannot disable (err = -22)
>> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
>> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
>> hub 7-1:1.0: activate --> -22
>> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
>> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
>> hub 7-1:1.0: activate --> -22
> ...
>
>> This continues forever. After adding tracebacks into the code,
>> the call sequence leading to this is found to be as follows.
>>
>> [<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8
>> [<ffffffc0007fceb4>] hub_resume+0x2c/0x3c
>> [<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158
>> [<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288
>> [<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98
>> [<ffffffc0007820a0>] __rpm_callback+0x48/0x7c
>> [<ffffffc00078217c>] rpm_callback+0xa8/0xd4
>> [<ffffffc000786234>] rpm_suspend+0x84/0x758
>> [<ffffffc000786ca4>] rpm_idle+0x2c8/0x498
>> [<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac
>> [<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c
>> [<ffffffc000803798>] hub_event+0x10ac/0x12ac
>> [<ffffffc000249bb8>] process_one_work+0x390/0x6b8
>> [<ffffffc00024abcc>] worker_thread+0x480/0x610
>> [<ffffffc000251a80>] kthread+0x164/0x178
>> [<ffffffc0002045d0>] ret_from_fork+0x10/0x40
>>
>> kick_hub_wq() is called from hub_activate() even after failures to
>> communicate with the hub. This results in an endless sequence of
>> hub event -> hub activate -> wq trigger -> hub event -> ...
>> To break the loop, only trigger the hub event queue if communication
>> with the hub is successful.
>>
>> Signed-off-by: Guenter Roeck <linux@...ck-us.net>
>> ---
>>  drivers/usb/core/hub.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
>> index 5286bf67869a..f22ab428b310 100644
>> --- a/drivers/usb/core/hub.c
>> +++ b/drivers/usb/core/hub.c
>> @@ -1197,7 +1197,8 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
>>  				&hub->leds, LED_CYCLE_PERIOD);
>>
>>  	/* Scan all ports that need attention */
>> -	kick_hub_wq(hub);
>> +	if (!status)
>> +		kick_hub_wq(hub);
>>
>>  	if (type == HUB_INIT2 || type == HUB_INIT3) {
>>  		/* Allow autosuspend if it was suppressed */
>
> I'm not at all sure this is the best solution.  status == 0 means that
> an URB was successfully _submitted_; it does not mean that future
> communication with the hub will be successful.
>
> The problem arises here because the system is unable to runtime-suspend
> the hub (you didn't say why, but probably because of the same
> communication problem).  When a suspend fails, drivers are informed

Yes, exactly.

> that the device is back at full power, and of course the hub driver
> tries to determine the hub's current state at that time.  When this
> fails, there's no reason to keep the hub at full power any longer, so
> the system tries to do another runtime suspend.  This suspend fails
> just like the first one, and there's your loop.
>

If we don't cut the loop in hub_activate(), where else would be
a good place to cut it ?

> This same kind of thing could happen with any driver.  In your case,
> the problem could be solved by making sure that when the hub fails to
> suspend, it is still at least minimally functional.  We don't do this
> now; when usb_port_suspend fails there is no checking to see whether
> the device is still working.
>
> (It would be interesting to know exactly at which point
> usb_port_suspend fails during your test.  And why didn't your system
> mark the hub as disconnected when the host controller was unbound?)
>
It does, but that happens a bit later. If I disable console logging
(for example by using dev_err_ratelimited() instead of dev_err() for the
observed errors), the hub is marked as disabled after a few hundred
or a few thousand loops. Only that code never gets to run runs if console
logging is enabled, and to restrict logging didn't seem to be a good
solution for the problem.

The test script I am running is:

while true
do
     for i in /sys/bus/platform/drivers/rockchip-dwc3/usb*; do
     	basename $i > $(dirname $i)/unbind
	basename $i > $(dirname $i)/bind
     done
     sleep $(awk "BEGIN {printf \"%.2f\", $((RANDOM % 150)) / 100 + 0.5}")
done

where rockchip-dwc3 is a slightly more complex version of dwc3-of-simple.c,
with pretty much the same functionality, only that it ties into extcon
to be able to handle cable states. The hub disconnect is a result of a call
to of_platform_depopulate(), similar to the same call in dwc3-of-simple.c.

The sleep in the above script is essential; it results in all kinds of
race conditions. The one I am trying to fix here is just one of them. The
system under test has two Type-C ports; one is connected to an Ethernet
interface and the other to an Apple Type-C adapter (it probably doesn't
matter much what is connected, as long as there is a hub).

> An even worse problem is what to do when the device is working okay but
> runtime suspend still fails (for example, if remote wakeup cannot be
> enabled).  How do we prevent the system from going into a similar loop
> then?
>

Good question. I am open to suggestions.

Thanks,
Guenter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ