[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <vgmbyikra4stccdspthialbaqwicf4bx5hnah2ufgzlrym6qvk@fqx6357jfep3>
Date: Fri, 13 Jun 2025 11:13:15 +0800
From: Xu Yang <xu.yang_2@....com>
To: John Ernberg <john.ernberg@...ia.se>
Cc: Shawn Guo <shawnguo@...nel.org>, Peter Chen <peter.chen@...nel.org>,
Shawn Guo <shawnguo2@...h.net>, "imx@...ts.linux.dev" <imx@...ts.linux.dev>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: Re: i.MX kernel hangup caused by chipidea USB gadget driver
On Thu, Jun 12, 2025 at 01:23:29PM +0000, John Ernberg wrote:
> Hi Xu, Shawn,
>
> On 6/10/25 1:30 PM, Xu Yang wrote:
> > Hi John,
> >
> > On Mon, Jun 09, 2025 at 02:17:30PM +0000, John Ernberg wrote:
> >> Hi Shawn, Xu,
> >>
> >> On Mon, Jun 09, 2025 at 07:53:22PM +0800, Xu Yang wrote:
> >>> Hi Shawn,
> >>>
> >>> Thanks for your reports!
> >>>
> >>> On Mon, Jun 09, 2025 at 01:31:06PM +0800, Shawn Guo wrote:
> >>>> Hi Xu, Peter,
> >>>>
> >>>> I'm seeing a kernel hangup on imx8mm-evk board. It happens when:
> >>>>
> >>>> - USB gadget is enabled as Ethernet
> >>>> - There is data transfer over USB Ethernet
> >>>> - Device is going in/out suspend
> >>>>
> >>>> A simple way to reproduce the issue could be:
> >>>>
> >>>> 1. Copy a big file (like 500MB) from host PC to device with scp
> >>>>
> >>>> 2. While the file copy is ongoing, suspend & resume the device like:
> >>>>
> >>>> $ echo +3 > /sys/class/rtc/rtc0/wakealarm; echo mem > /sys/power/state
> >>>>
> >>>> 3. The device will hang up there
> >>>>
> >>>> I reproduced on the following kernels:
> >>>>
> >>>> - Mainline kernel
> >>>> - NXP kernel lf-6.6.y
> >>>> - NXP kernel lf-6.12.y
> >>>>
> >>>> But NXP kernel lf-6.1.y doesn't have this problem. I tracked it down to
> >>>> Peter's commit [1] on lf-6.1.y, and found that the gadget disconnect &
> >>>> connect calls got lost from suspend & resume hooks, when the commit were
> >>>> split and pushed upstream. I confirm that adding the calls back fixes
> >>>> the hangup.
> >>
> >> We probably ran into the same problem trying to bring onboard 6.12, going
> >> from 6.1, on iMX8QXP. I managed to trace the hang to EP priming through a
> >> combination of debug tracing and BUG_ON experiments. See if it starts
> >> splatin with the below change.
> >>
> >> ----------------->8------------------
> >>
> >> >From 092599ab6f9e20412a7ca1eb118dd2be80cd18ff Mon Sep 17 00:00:00 2001
> >> From: John Ernberg <john.ernberg@...ia.se>
> >> Date: Mon, 5 May 2025 09:09:01 +0200
> >> Subject: [PATCH] USB: ci: gadget: Panic if priming when gadget off
> >>
> >> ---
> >> drivers/usb/chipidea/udc.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
> >> index 2fea263a5e30..544aa4fa2d1d 100644
> >> --- a/drivers/usb/chipidea/udc.c
> >> +++ b/drivers/usb/chipidea/udc.c
> >> @@ -203,8 +203,10 @@ static int hw_ep_prime(struct ci_hdrc *ci, int num, int dir, int is_ctrl)
> >>
> >> hw_write(ci, OP_ENDPTPRIME, ~0, BIT(n));
> >>
> >> - while (hw_read(ci, OP_ENDPTPRIME, BIT(n)))
> >> + while (hw_read(ci, OP_ENDPTPRIME, BIT(n))) {
> >> cpu_relax();
> >> + BUG_ON(dir == TX && !hw_read(ci, OP_ENDPTCTRL + num, ENDPTCTRL_TXE));
> >> + }
> >> if (is_ctrl && dir == RX && hw_read(ci, OP_ENDPTSETUPSTAT, BIT(num)))
> >> return -EAGAIN;
> >>
> >> ----------------->8------------------
> >>
> >> On the iMX8QXP you may additionally run into asychronous aborts and SError
> >> due to resource being disabled.
> >>
> >>>>
> >>>> ---8<--------------------
> >>>>
> >>>> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
> >>>> index 8a9b31fd5c89..72329a7eac4d 100644
> >>>> --- a/drivers/usb/chipidea/udc.c
> >>>> +++ b/drivers/usb/chipidea/udc.c
> >>>> @@ -2374,6 +2374,9 @@ static void udc_suspend(struct ci_hdrc *ci)
> >>>> */
> >>>> if (hw_read(ci, OP_ENDPTLISTADDR, ~0) == 0)
> >>>> hw_write(ci, OP_ENDPTLISTADDR, ~0, ~0);
> >>>> +
> >>>> + if (ci->driver && ci->vbus_active && (ci->gadget.state != USB_STATE_SUSPENDED))
> >>>> + usb_gadget_disconnect(&ci->gadget);
> >>>> }
> >>>>
> >>>> static void udc_resume(struct ci_hdrc *ci, bool power_lost)
> >>>> @@ -2384,6 +2387,9 @@ static void udc_resume(struct ci_hdrc *ci, bool power_lost)
> >>>> OTGSC_BSVIS | OTGSC_BSVIE);
> >>>> if (ci->vbus_active)
> >>>> usb_gadget_vbus_disconnect(&ci->gadget);
> >>>> + } else {
> >>>> + if (ci->driver && ci->vbus_active)
> >>>> + usb_gadget_connect(&ci->gadget);
> >>>> }
> >>>>
> >>>> /* Restore value 0 if it was set for power lost check */
> >>>>
> >>>> ---->8------------------
> >
> > Does above change work for you?
> >
>
> I have ran suspend/resume tests for about 12 hours now with this change
> and not had any trouble on iMX8QXP, where it was not possible to run
> such tests for so long before.
>
> Please pick up if you submit this formally:
>
> Tested-by: John Ernberg <john.ernberg@...ia.se> # iMX8QXP
Good to know.
Thanks,
Xu Yang
>
> Thanks! // John Ernberg
Powered by blists - more mailing lists