linux-kernel - Re: i.MX kernel hangup caused by chipidea USB gadget driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aEf/2+3MU5ED2sxE@dragon>
Date: Tue, 10 Jun 2025 17:50:19 +0800
From: Shawn Guo <shawnguo2@...h.net>
To: Xu Yang <xu.yang_2@....com>
Cc: Peter Chen <peter.chen@...nel.org>, Shawn Guo <shawnguo@...nel.org>,
	imx@...ts.linux.dev, linux-usb@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: i.MX kernel hangup caused by chipidea USB gadget driver

On Mon, Jun 09, 2025 at 07:53:22PM +0800, Xu Yang wrote:

<snip>

> During the scp process, the usb host won't put usb device to suspend state.
> In current design, then the ether driver doesn't know the system has
> suspended after echo mem. The root cause is that ether driver is still tring
> to queue usb request after usb controller has suspended where usb clock is off,
> then the system hang.
> 
> With the above changes, I think the ether driver will fail to eth_start_xmit() 
> at an ealier stage, so the issue can't be triggered.
> 
> I think the ether driver needs call gether_suspend() accordingly, to do this,
> the controller driver need explicitly call suspend() function when it's going
> to be suspended. Could you check whether below patch fix the issue?

Thanks for the patch, Xu!  It does fix the hangup but seems to be less
reliable than my/Peter's change (disconnecting gadget), per my testing
on a custom i.MX8MM board.  With your change, host/PC doesn't disconnect
gadget when the board suspends.  After a few suspend cycles, Ethernet
gadget stops working and the following workqueue lockup is seen.  There
seems to some be other bugs?

[  223.047990] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  223.054097] rcu:     1-...0: (7 ticks this GP) idle=bb7c/1/0x4000000000000000 softirq=5368/5370 fqs=2431
[  223.063318] rcu:     (detected by 0, t=5252 jiffies, g=4705, q=2400 ncpus=4)
[  223.070105] Task dump for CPU 1:
[  223.073330] task:systemd-network state:R  running task     stack:0     pid:406   ppid:1      flags:0x00000202
[  223.083248] Call trace:
[  223.085692]  __switch_to+0xc0/0x124
[  246.747996] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 43s!

However, your change seems working fine on i.MX8MM EVK.  It's probably
due to the fact that host disconnects gadget for some reason when EVK
suspends.  This is a different behavior from the custom board above.
We do not really expect this disconnecting, do we?

Shawn

>  ---8<--------------------
> 
> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
> index 8a9b31fd5c89..27a7674ed62c 100644
> --- a/drivers/usb/chipidea/udc.c
> +++ b/drivers/usb/chipidea/udc.c
> @@ -2367,6 +2367,8 @@ static void udc_id_switch_for_host(struct ci_hdrc *ci)
>  #ifdef CONFIG_PM_SLEEP
>  static void udc_suspend(struct ci_hdrc *ci)
>  {
> +       ci->driver->suspend(&ci->gadget);
> +
>         /*
>          * Set OP_ENDPTLISTADDR to be non-zero for
>          * checking if controller resume from power lost
> @@ -2389,6 +2391,8 @@ static void udc_resume(struct ci_hdrc *ci, bool power_lost)
>         /* Restore value 0 if it was set for power lost check */
>         if (hw_read(ci, OP_ENDPTLISTADDR, ~0) == 0xFFFFFFFF)
>                 hw_write(ci, OP_ENDPTLISTADDR, ~0, 0);
> +
> +       ci->driver->resume(&ci->gadget);
>  }
>  #endif
> 
>  ---->8------------------