[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240903234121.2ljm62u2ldb54h72@synopsys.com>
Date: Tue, 3 Sep 2024 23:41:27 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Selvarasu Ganesan <selvarasu.g@...sung.com>
CC: Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jh0801.jung@...sung.com" <jh0801.jung@...sung.com>,
"dh10.jung@...sung.com" <dh10.jung@...sung.com>,
"naushad@...sung.com" <naushad@...sung.com>,
"akash.m5@...sung.com" <akash.m5@...sung.com>,
"rc93.raju@...sung.com" <rc93.raju@...sung.com>,
"taehyun.cho@...sung.com" <taehyun.cho@...sung.com>,
"hongpooh.kim@...sung.com" <hongpooh.kim@...sung.com>,
"eomji.oh@...sung.com" <eomji.oh@...sung.com>,
"shijie.cai@...sung.com" <shijie.cai@...sung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm
On Mon, Sep 02, 2024, Selvarasu Ganesan wrote:
>
> On 8/31/2024 6:20 AM, Thinh Nguyen wrote:
> > Hi Selvarasu,
> >
> > On Fri, Aug 30, 2024, Selvarasu Ganesan wrote:
> >> On 8/10/2024 5:12 AM, Thinh Nguyen wrote:
> >>> On Thu, Aug 08, 2024, Selvarasu Ganesan wrote:
> >>>> On 8/8/2024 6:45 AM, Thinh Nguyen wrote:
> >>>>> On Wed, Aug 07, 2024, Selvarasu Ganesan wrote:
> >>>>>> On 8/7/2024 6:08 AM, Thinh Nguyen wrote:
> >>>>>>> On Fri, Jul 19, 2024, Selvarasu Ganesan wrote:
> >>>>>>>> In certain scenarios, there is a chance that the CPU may not be
> >>>>>>>> scheduled the bottom half of dwc3 interrupt. This is because the CPU
> >>>>>>>> may hang up where any work queue lockup has happened for the same CPU
> >>>>>>>> that is trying to schedule the dwc3 thread interrupt. In this scenario,
> >>>>>>>> the USB can enter runtime suspend as the bus may idle for a longer time
> >>>>>>>> , or user can reconnect the USB cable. Then, the dwc3 event interrupt
> >>>>>>>> can be enabled when runtime resume is happening with regardless of the
> >>>>>>>> previous event status. This can lead to a dwc3 IRQ storm due to the
> >>>>>>>> return from the interrupt handler by checking only the evt->flags as
> >>>>>>>> DWC3_EVENT_PENDING, where the same flag was set as DWC3_EVENT_PENDING
> >>>>>>>> in previous work queue lockup.
> >>>>>>>> Let's consider the following sequences in this scenario,
> >>>>>>>>
> >>>>>>>> Call trace of dwc3 IRQ after workqueue lockup scenario
> >>>>>>>> ======================================================
> >>>>>>>> IRQ #1:
> >>>>>>>> ->dwc3_interrupt()
> >>>>>>>> ->dwc3_check_event_buf()
> >>>>>>>> ->if (evt->flags & DWC3_EVENT_PENDING)
> >>>>>>>> return IRQ_HANDLED;
> >>>>>>>> ->evt->flags |= DWC3_EVENT_PENDING;
> >>>>>>>> ->/* Disable interrupt by setting DWC3_GEVNTSIZ_INTMASK in
> >>>>>>>> DWC3_GEVNTSIZ
> >>>>>>>> ->return IRQ_WAKE_THREAD; // No workqueue scheduled for dwc3
> >>>>>>>> thread_fu due to workqueue lockup
> >>>>>>>> even after return IRQ_WAKE_THREAD
> >>>>>>>> from top-half.
> >>>>>>>>
> >>>>>>>> Thread #2:
> >>>>>>>> ->dwc3_runtime_resume()
> >>>>>>>> ->dwc3_resume_common()
> >>>>>>>> ->dwc3_gadget_resume()
> >>>>>>>> ->dwc3_gadget_soft_connect()
> >>>>>>>> ->dwc3_event_buffers_setup()
> >>>>>>>> ->/*Enable interrupt by clearing DWC3_GEVNTSIZ_INTMASK in
> >>>>>>>> DWC3_GEVNTSIZ*/
> >>>>>>>>
> >>>>>>>> Start IRQ Storming after enable dwc3 event in resume path
> >>>>>>>> =========================================================
> >>>>>>>> CPU0: IRQ
> >>>>>>>> dwc3_interrupt()
> >>>>>>>> dwc3_check_event_buf()
> >>>>>>>> if (evt->flags & DWC3_EVENT_PENDING)
> >>>>>>>> return IRQ_HANDLED;
> >>>>>>>>
> >>>>>>>> CPU0: IRQ
> >>>>>>>> dwc3_interrupt()
> >>>>>>>> dwc3_check_event_buf()
> >>>>>>>> if (evt->flags & DWC3_EVENT_PENDING)
> >>>>>>>> return IRQ_HANDLED;
> >>>>>>>> ..
> >>>>>>>> ..
> >>>>>>>>
> >>>>>>>> To fix this issue by avoiding enabling of the dwc3 event interrupt in
> >>>>>>>> the runtime resume path if dwc3 event processing is in progress.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Selvarasu Ganesan <selvarasu.g@...sung.com>
> >>>>>>>> ---
> >>>>>>>> drivers/usb/dwc3/core.c | 8 ++++++--
> >>>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> >>>>>>>> index cb82557678dd..610792a70805 100644
> >>>>>>>> --- a/drivers/usb/dwc3/core.c
> >>>>>>>> +++ b/drivers/usb/dwc3/core.c
> >>>>>>>> @@ -549,8 +549,12 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
> >>>>>>>> lower_32_bits(evt->dma));
> >>>>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0),
> >>>>>>>> upper_32_bits(evt->dma));
> >>>>>>>> - dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>>>>>> - DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>>>>>> +
> >>>>>>>> + /* Skip enable dwc3 event interrupt if event is processing in middle */
> >>>>>>>> + if (!(evt->flags & DWC3_EVENT_PENDING))
> >>>>>>>> + dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>>>>>> + DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>>>>>> +
> >>>>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
> >>>>>>>>
> >>>>>>>> return 0;
> >>>>>>>> --
> >>>>>>>> 2.17.1
> >>>>>>>>
> >>>>>>> We're not waking up from a hibernation. So after a soft-reset and
> >>>>>>> resume, the events that weren't processed are stale. They should be
> >>>>>>> processed prior to entering suspend or be discarded before resume.
> >>>>>>>
> >>>>>>> The synchronize_irq() during suspend() was not sufficient to prevent
> >>>>>>> this? What are we missing here.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Thinh
> >>>>>> I don’t think the triggering of interrupt would not be stopped even if
> >>>>>> do soft reset. It's because of event count is may be valid .
> >>>>> Ok. I think I see what you're referring to when you say "event is
> >>>>> processing in the middle" now.
> >>>>>
> >>>>> What you want to check is probably this in dwc3_event_buffers_setup().
> >>>>> Please confirm:
> >>>>>
> >>>>> if (dwc->pending_events)
> >>>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>>> DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>>> else
> >>>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>> Yes, we are expecting the same. But, we must verify the status of
> >>>> evt->flags, which will indicate whether the event is currently
> >>>> processing in middle or not. The below code is for the reference.
> >>>>
> >>>> if (!(evt->flags & DWC3_EVENT_PENDING))
> >>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>> DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>> else
> >>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>> DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length));
> >>> So, this happens while pending_events is set right? I need to review
> >>> this runtime suspend flow next week. Something doesn't look right. When
> >>> there's a suspend/resume runtime or not, there's a soft disconnect. We
> >>> shouldn't be processing any event prior to going into suspend. Also, we
> >>> shouldn't be doing soft-disconnect while connected and in operation
> >>> unless we specifically tell it to.
> >> HI Thinh,
> >>
> >> Would you be able to review this runtime suspend flow?
> >>
> >> From our end, after conducting multiple regression tests, we have
> >> determined that the resetting of "evt->flags" are sufficient when
> >> attempting to enable event IRQ masks instead of enable event IRQ mask
> >> based on pending event flags. There is a possibility that reconnecting
> >> USB with the host PC may cause event interrupts to be missed by the CPU
> >> if disable event IRQ mask. So, The fix should be as follow. Could you
> >> please review this once from your end?
> >>
> >> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> >> index ccc3895dbd7f..3b2441608e9e 100644
> >> --- a/drivers/usb/dwc3/core.c
> >> +++ b/drivers/usb/dwc3/core.c
> >> @@ -554,6 +554,15 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
> >> lower_32_bits(evt->dma));
> >> dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0),
> >> upper_32_bits(evt->dma));
> >> +
> >> + /*
> >> + * The DWC3_EVENT_PENDING flag is cleared if it has
> >> + * already been set when enabling the event IRQ mask
> >> + * to prevent possibly of an IRQ storm.
> >> + */
> >> + if (evt->flags & DWC3_EVENT_PENDING)
> >> + evt->flags &= ~DWC3_EVENT_PENDING;
> >> +
> >> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >> DWC3_GEVNTSIZ_SIZE(evt->length));
> >> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
> >>
> > Sorry for the delay response.
> >
> > In addition to that, please rework and rename
> > dwc3_gadget_process_pending_event(). We're not supposed to handle any
> > left-over event. So remove the manual calls to the interrupt handlers
> > there.
> Hi Thinh,
>
> Thanks for your comments.
>
> Regarding the handling of leftover events during dwc3 runtime resume, I
> understand that we are not supposed to handle any leftover events. Would
Actually, there's no leftover event.
> you be interested in making changes to the code as suggested below? The
> reason for removing interrupt handlers from
> dwc3_gadget_process_pending_events() is to avoid handling any leftover
> events from the last suspend right. If so, based on my understanding, we
> can simply remove the use of dwc3_gadget_process_pending_events()
> instead of rework on this function since it is not necessary if we
> remove the call to interrupt handlers from there.
That's right.
>
> I would like to reconfirm from our end that in our failure scenario, we
> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3
> resume sequence is executed, and the dwc->pending_events flag is not
> being set.
>
>
>
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index ccc3895dbd7f..63e8dd24ad0e 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -550,6 +550,15 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
>
> evt = dwc->ev_buf;
> evt->lpos = 0;
> +
> + /*
> + * The DWC3_EVENT_PENDING flag is cleared if it has
> + * already been set when enabling the event IRQ mask
> + * to prevent possibly of an IRQ storm.
> + */
> + if (evt->flags & DWC3_EVENT_PENDING)
> + evt->flags &= ~DWC3_EVENT_PENDING;
> +
We actually don't need to clear DWCC3_EVENT_PENDING flag if we remove
the below.
> dwc3_writel(dwc->regs, DWC3_GEVNTADRLO(0),
> lower_32_bits(evt->dma));
> dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0),
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 89fc690fdf34..951c805337c2 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4739,8 +4739,6 @@ int dwc3_gadget_resume(struct dwc3 *dwc)
> void dwc3_gadget_process_pending_events(struct dwc3 *dwc)
> {
> if (dwc->pending_events) {
> - dwc3_interrupt(dwc->irq_gadget, dwc->ev_buf);
> - dwc3_thread_interrupt(dwc->irq_gadget, dwc->ev_buf);
> pm_runtime_put(dwc->dev);
> dwc->pending_events = false;
> enable_irq(dwc->irq_gadget);
>
>
> Thanks,
> Selva
>
> >
> > On runtime suspend, the device is soft disconnected. So any interrupt
> > assertion to notify a new connection must be a custom configuration of
> > your platform. No event should be generated while the run_stop bit is
> > cleared.
> >
> > On runtime resume, we will initiate soft-reset and soft-connect to
> > restore the run_stop bit. A new connection event will be generated then.
>
> Agree.
>
Please try the following with some cleanup and additional comments:
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index ccc3895dbd7f..f02dae464efe 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2484,7 +2484,11 @@ static int dwc3_runtime_resume(struct device *dev)
switch (dwc->current_dr_role) {
case DWC3_GCTL_PRTCAP_DEVICE:
- dwc3_gadget_process_pending_events(dwc);
+ if (dwc->pending_events) {
+ pm_runtime_put(dwc->dev);
+ dwc->pending_events = false;
+ enable_irq(dwc->irq_gadget);
+ }
break;
case DWC3_GCTL_PRTCAP_HOST:
default:
@@ -2574,6 +2578,14 @@ static void dwc3_complete(struct device *dev)
static const struct dev_pm_ops dwc3_dev_pm_ops = {
SET_SYSTEM_SLEEP_PM_OPS(dwc3_suspend, dwc3_resume)
.complete = dwc3_complete,
+
+ /*
+ * Current implementation for runtime suspend stops the controller on
+ * disconnection. It relies on platforms with custom connection
+ * interrupt to notify the driver to start the controller again. This
+ * flow is platform specific implemenation and not part of the
+ * controller's programming flow.
+ */
SET_RUNTIME_PM_OPS(dwc3_runtime_suspend, dwc3_runtime_resume,
dwc3_runtime_idle)
};
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..2fa3fd55eb32 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1673,7 +1673,6 @@ static inline void dwc3_otg_host_init(struct dwc3 *dwc)
#if !IS_ENABLED(CONFIG_USB_DWC3_HOST)
int dwc3_gadget_suspend(struct dwc3 *dwc);
int dwc3_gadget_resume(struct dwc3 *dwc);
-void dwc3_gadget_process_pending_events(struct dwc3 *dwc);
#else
static inline int dwc3_gadget_suspend(struct dwc3 *dwc)
{
@@ -1684,10 +1683,6 @@ static inline int dwc3_gadget_resume(struct dwc3 *dwc)
{
return 0;
}
-
-static inline void dwc3_gadget_process_pending_events(struct dwc3 *dwc)
-{
-}
#endif /* !IS_ENABLED(CONFIG_USB_DWC3_HOST) */
#if IS_ENABLED(CONFIG_USB_DWC3_ULPI)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..ea97e9c1d1bd 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4735,14 +4735,3 @@ int dwc3_gadget_resume(struct dwc3 *dwc)
return dwc3_gadget_soft_connect(dwc);
}
-
-void dwc3_gadget_process_pending_events(struct dwc3 *dwc)
-{
- if (dwc->pending_events) {
- dwc3_interrupt(dwc->irq_gadget, dwc->ev_buf);
- dwc3_thread_interrupt(dwc->irq_gadget, dwc->ev_buf);
- pm_runtime_put(dwc->dev);
- dwc->pending_events = false;
- enable_irq(dwc->irq_gadget);
- }
-}
--
BR,
Thinh
Powered by blists - more mailing lists