lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5acfd382-28d5-4d74-997c-361499cc0bb0@actia.se>
Date: Thu, 12 Jun 2025 13:23:29 +0000
From: John Ernberg <john.ernberg@...ia.se>
To: Xu Yang <xu.yang_2@....com>, Shawn Guo <shawnguo@...nel.org>
CC: Peter Chen <peter.chen@...nel.org>, Shawn Guo <shawnguo2@...h.net>,
	"imx@...ts.linux.dev" <imx@...ts.linux.dev>, "linux-usb@...r.kernel.org"
	<linux-usb@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: i.MX kernel hangup caused by chipidea USB gadget driver

Hi Xu, Shawn,

On 6/10/25 1:30 PM, Xu Yang wrote:
> Hi John,
> 
> On Mon, Jun 09, 2025 at 02:17:30PM +0000, John Ernberg wrote:
>> Hi Shawn, Xu,
>>
>> On Mon, Jun 09, 2025 at 07:53:22PM +0800, Xu Yang wrote:
>>> Hi Shawn,
>>>
>>> Thanks for your reports!
>>>
>>> On Mon, Jun 09, 2025 at 01:31:06PM +0800, Shawn Guo wrote:
>>>> Hi Xu, Peter,
>>>>
>>>> I'm seeing a kernel hangup on imx8mm-evk board.  It happens when:
>>>>
>>>>   - USB gadget is enabled as Ethernet
>>>>   - There is data transfer over USB Ethernet
>>>>   - Device is going in/out suspend
>>>>
>>>> A simple way to reproduce the issue could be:
>>>>
>>>>   1. Copy a big file (like 500MB) from host PC to device with scp
>>>>
>>>>   2. While the file copy is ongoing, suspend & resume the device like:
>>>>
>>>>      $ echo +3 > /sys/class/rtc/rtc0/wakealarm; echo mem > /sys/power/state
>>>>
>>>>   3. The device will hang up there
>>>>
>>>> I reproduced on the following kernels:
>>>>
>>>>   - Mainline kernel
>>>>   - NXP kernel lf-6.6.y
>>>>   - NXP kernel lf-6.12.y
>>>>
>>>> But NXP kernel lf-6.1.y doesn't have this problem.  I tracked it down to
>>>> Peter's commit [1] on lf-6.1.y, and found that the gadget disconnect &
>>>> connect calls got lost from suspend & resume hooks, when the commit were
>>>> split and pushed upstream.  I confirm that adding the calls back fixes
>>>> the hangup.
>>
>> We probably ran into the same problem trying to bring onboard 6.12, going
>> from 6.1, on iMX8QXP. I managed to trace the hang to EP priming through a
>> combination of debug tracing and BUG_ON experiments. See if it starts
>> splatin with the below change.
>>
>> ----------------->8------------------
>>
>> >From 092599ab6f9e20412a7ca1eb118dd2be80cd18ff Mon Sep 17 00:00:00 2001
>> From: John Ernberg <john.ernberg@...ia.se>
>> Date: Mon, 5 May 2025 09:09:01 +0200
>> Subject: [PATCH] USB: ci: gadget: Panic if priming when gadget off
>>
>> ---
>>   drivers/usb/chipidea/udc.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
>> index 2fea263a5e30..544aa4fa2d1d 100644
>> --- a/drivers/usb/chipidea/udc.c
>> +++ b/drivers/usb/chipidea/udc.c
>> @@ -203,8 +203,10 @@ static int hw_ep_prime(struct ci_hdrc *ci, int num, int dir, int is_ctrl)
>>
>>      hw_write(ci, OP_ENDPTPRIME, ~0, BIT(n));
>>
>> -   while (hw_read(ci, OP_ENDPTPRIME, BIT(n)))
>> +   while (hw_read(ci, OP_ENDPTPRIME, BIT(n))) {
>>          cpu_relax();
>> +       BUG_ON(dir == TX && !hw_read(ci, OP_ENDPTCTRL + num, ENDPTCTRL_TXE));
>> +   }
>>      if (is_ctrl && dir == RX && hw_read(ci, OP_ENDPTSETUPSTAT, BIT(num)))
>>          return -EAGAIN;
>>
>> ----------------->8------------------
>>
>> On the iMX8QXP you may additionally run into asychronous aborts and SError
>> due to resource being disabled.
>>
>>>>
>>>> ---8<--------------------
>>>>
>>>> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
>>>> index 8a9b31fd5c89..72329a7eac4d 100644
>>>> --- a/drivers/usb/chipidea/udc.c
>>>> +++ b/drivers/usb/chipidea/udc.c
>>>> @@ -2374,6 +2374,9 @@ static void udc_suspend(struct ci_hdrc *ci)
>>>>           */
>>>>          if (hw_read(ci, OP_ENDPTLISTADDR, ~0) == 0)
>>>>                  hw_write(ci, OP_ENDPTLISTADDR, ~0, ~0);
>>>> +
>>>> +       if (ci->driver && ci->vbus_active && (ci->gadget.state != USB_STATE_SUSPENDED))
>>>> +               usb_gadget_disconnect(&ci->gadget);
>>>>   }
>>>>
>>>>   static void udc_resume(struct ci_hdrc *ci, bool power_lost)
>>>> @@ -2384,6 +2387,9 @@ static void udc_resume(struct ci_hdrc *ci, bool power_lost)
>>>>                                          OTGSC_BSVIS | OTGSC_BSVIE);
>>>>                  if (ci->vbus_active)
>>>>                          usb_gadget_vbus_disconnect(&ci->gadget);
>>>> +       } else {
>>>> +               if (ci->driver && ci->vbus_active)
>>>> +                       usb_gadget_connect(&ci->gadget);
>>>>          }
>>>>
>>>>          /* Restore value 0 if it was set for power lost check */
>>>>
>>>> ---->8------------------
> 
> Does above change work for you?
> 

I have ran suspend/resume tests for about 12 hours now with this change 
and not had any trouble on iMX8QXP, where it was not possible to run 
such tests for so long before.

Please pick up if you submit this formally:

Tested-by: John Ernberg <john.ernberg@...ia.se> # iMX8QXP

Thanks! // John Ernberg

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ