lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240903234121.2ljm62u2ldb54h72@synopsys.com>
Date: Tue, 3 Sep 2024 23:41:27 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Selvarasu Ganesan <selvarasu.g@...sung.com>
CC: Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jh0801.jung@...sung.com" <jh0801.jung@...sung.com>,
        "dh10.jung@...sung.com" <dh10.jung@...sung.com>,
        "naushad@...sung.com" <naushad@...sung.com>,
        "akash.m5@...sung.com" <akash.m5@...sung.com>,
        "rc93.raju@...sung.com" <rc93.raju@...sung.com>,
        "taehyun.cho@...sung.com" <taehyun.cho@...sung.com>,
        "hongpooh.kim@...sung.com" <hongpooh.kim@...sung.com>,
        "eomji.oh@...sung.com" <eomji.oh@...sung.com>,
        "shijie.cai@...sung.com" <shijie.cai@...sung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm

On Mon, Sep 02, 2024, Selvarasu Ganesan wrote:
> 
> On 8/31/2024 6:20 AM, Thinh Nguyen wrote:
> > Hi Selvarasu,
> >
> > On Fri, Aug 30, 2024, Selvarasu Ganesan wrote:
> >> On 8/10/2024 5:12 AM, Thinh Nguyen wrote:
> >>> On Thu, Aug 08, 2024, Selvarasu Ganesan wrote:
> >>>> On 8/8/2024 6:45 AM, Thinh Nguyen wrote:
> >>>>> On Wed, Aug 07, 2024, Selvarasu Ganesan wrote:
> >>>>>> On 8/7/2024 6:08 AM, Thinh Nguyen wrote:
> >>>>>>> On Fri, Jul 19, 2024, Selvarasu Ganesan wrote:
> >>>>>>>> In certain scenarios, there is a chance that the CPU may not be
> >>>>>>>> scheduled the bottom half of dwc3 interrupt. This is because the CPU
> >>>>>>>> may hang up where any work queue lockup has happened for the same CPU
> >>>>>>>> that is trying to schedule the dwc3 thread interrupt. In this scenario,
> >>>>>>>> the USB can enter runtime suspend as the bus may idle for a longer time
> >>>>>>>> , or user can reconnect the USB cable. Then, the dwc3 event interrupt
> >>>>>>>> can be enabled when runtime resume is happening with regardless of the
> >>>>>>>> previous event status. This can lead to a dwc3 IRQ storm due to the
> >>>>>>>> return from the interrupt handler by checking only the evt->flags as
> >>>>>>>> DWC3_EVENT_PENDING, where the same flag was set as DWC3_EVENT_PENDING
> >>>>>>>> in previous work queue lockup.
> >>>>>>>> Let's consider the following sequences in this scenario,
> >>>>>>>>
> >>>>>>>> Call trace of dwc3 IRQ after workqueue lockup scenario
> >>>>>>>> ======================================================
> >>>>>>>> IRQ #1:
> >>>>>>>> ->dwc3_interrupt()
> >>>>>>>>       ->dwc3_check_event_buf()
> >>>>>>>>             ->if (evt->flags & DWC3_EVENT_PENDING)
> >>>>>>>>                          return IRQ_HANDLED;
> >>>>>>>>             ->evt->flags |= DWC3_EVENT_PENDING;
> >>>>>>>>             ->/* Disable interrupt by setting DWC3_GEVNTSIZ_INTMASK  in
> >>>>>>>>                                                             DWC3_GEVNTSIZ
> >>>>>>>>             ->return IRQ_WAKE_THREAD; // No workqueue scheduled for dwc3
> >>>>>>>>                                          thread_fu due to workqueue lockup
> >>>>>>>>                                          even after return IRQ_WAKE_THREAD
> >>>>>>>>                                          from top-half.
> >>>>>>>>
> >>>>>>>> Thread #2:
> >>>>>>>> ->dwc3_runtime_resume()
> >>>>>>>>      ->dwc3_resume_common()
> >>>>>>>>        ->dwc3_gadget_resume()
> >>>>>>>>           ->dwc3_gadget_soft_connect()
> >>>>>>>>             ->dwc3_event_buffers_setup()
> >>>>>>>>                ->/*Enable interrupt by clearing  DWC3_GEVNTSIZ_INTMASK in
> >>>>>>>>                                                             DWC3_GEVNTSIZ*/
> >>>>>>>>
> >>>>>>>> Start IRQ Storming after enable dwc3 event in resume path
> >>>>>>>> =========================================================
> >>>>>>>> CPU0: IRQ
> >>>>>>>> dwc3_interrupt()
> >>>>>>>>      dwc3_check_event_buf()
> >>>>>>>>             if (evt->flags & DWC3_EVENT_PENDING)
> >>>>>>>>              return IRQ_HANDLED;
> >>>>>>>>
> >>>>>>>> CPU0: IRQ
> >>>>>>>> dwc3_interrupt()
> >>>>>>>>      dwc3_check_event_buf()
> >>>>>>>>             if (evt->flags & DWC3_EVENT_PENDING)
> >>>>>>>>              return IRQ_HANDLED;
> >>>>>>>> ..
> >>>>>>>> ..
> >>>>>>>>
> >>>>>>>> To fix this issue by avoiding enabling of the dwc3 event interrupt in
> >>>>>>>> the runtime resume path if dwc3 event processing is in progress.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Selvarasu Ganesan <selvarasu.g@...sung.com>
> >>>>>>>> ---
> >>>>>>>>      drivers/usb/dwc3/core.c | 8 ++++++--
> >>>>>>>>      1 file changed, 6 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> >>>>>>>> index cb82557678dd..610792a70805 100644
> >>>>>>>> --- a/drivers/usb/dwc3/core.c
> >>>>>>>> +++ b/drivers/usb/dwc3/core.c
> >>>>>>>> @@ -549,8 +549,12 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
> >>>>>>>>      			lower_32_bits(evt->dma));
> >>>>>>>>      	dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0),
> >>>>>>>>      			upper_32_bits(evt->dma));
> >>>>>>>> -	dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>>>>>> -			DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>>>>>> +
> >>>>>>>> +	/* Skip enable dwc3 event interrupt if event is processing in middle */
> >>>>>>>> +	if (!(evt->flags & DWC3_EVENT_PENDING))
> >>>>>>>> +		dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>>>>>> +				DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>>>>>> +
> >>>>>>>>      	dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
> >>>>>>>>      
> >>>>>>>>      	return 0;
> >>>>>>>> -- 
> >>>>>>>> 2.17.1
> >>>>>>>>
> >>>>>>> We're not waking up from a hibernation. So after a soft-reset and
> >>>>>>> resume, the events that weren't processed are stale. They should be
> >>>>>>> processed prior to entering suspend or be discarded before resume.
> >>>>>>>
> >>>>>>> The synchronize_irq() during suspend() was not sufficient to prevent
> >>>>>>> this? What are we missing here.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Thinh
> >>>>>> I don’t think the triggering of interrupt would not be stopped even if
> >>>>>> do soft reset. It's because of event count is may be valid .
> >>>>> Ok. I think I see what you're referring to when you say "event is
> >>>>> processing in the middle" now.
> >>>>>
> >>>>> What you want to check is probably this in dwc3_event_buffers_setup().
> >>>>> Please confirm:
> >>>>>
> >>>>> if (dwc->pending_events)
> >>>>> 	dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>>> 			DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>>> else
> >>>>> 	dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>> Yes, we are expecting the same. But, we must verify the status of
> >>>> evt->flags, which will indicate whether the event is currently
> >>>> processing in middle or not. The below code is for the reference.
> >>>>
> >>>> if (!(evt->flags & DWC3_EVENT_PENDING))
> >>>> 	dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>> 			 DWC3_GEVNTSIZ_SIZE(evt->length));
> >>>> else
> >>>> 	dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>>> 			DWC3_GEVNTSIZ_INTMASK | DWC3_GEVNTSIZ_SIZE(evt->length));
> >>> So, this happens while pending_events is set right? I need to review
> >>> this runtime suspend flow next week. Something doesn't look right. When
> >>> there's a suspend/resume runtime or not, there's a soft disconnect. We
> >>> shouldn't be processing any event prior to going into suspend. Also, we
> >>> shouldn't be doing soft-disconnect while connected and in operation
> >>> unless we specifically tell it to.
> >> HI Thinh,
> >>
> >> Would you be able to review this runtime suspend flow?
> >>
> >>   From our end, after conducting multiple regression tests, we have
> >> determined that the resetting of "evt->flags" are sufficient when
> >> attempting to enable event IRQ masks instead of enable event IRQ mask
> >> based on pending event flags. There is a possibility that reconnecting
> >> USB with the host PC may cause event interrupts to be missed by the CPU
> >> if disable event IRQ mask.  So, The fix should be as follow. Could you
> >> please review this once from your end?
> >>
> >> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> >> index ccc3895dbd7f..3b2441608e9e 100644
> >> --- a/drivers/usb/dwc3/core.c
> >> +++ b/drivers/usb/dwc3/core.c
> >> @@ -554,6 +554,15 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
> >>                           lower_32_bits(evt->dma));
> >>           dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0),
> >>                           upper_32_bits(evt->dma));
> >> +
> >> +       /*
> >> +        * The DWC3_EVENT_PENDING flag is cleared if it has
> >> +        * already been set when enabling the event IRQ mask
> >> +        * to prevent possibly of an IRQ storm.
> >> +        */
> >> +       if (evt->flags & DWC3_EVENT_PENDING)
> >> +               evt->flags &= ~DWC3_EVENT_PENDING;
> >> +
> >>           dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> >>                           DWC3_GEVNTSIZ_SIZE(evt->length));
> >>           dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
> >>
> > Sorry for the delay response.
> >
> > In addition to that, please rework and rename
> > dwc3_gadget_process_pending_event(). We're not supposed to handle any
> > left-over event. So remove the manual calls to the interrupt handlers
> > there.
> Hi Thinh,
> 
> Thanks for your comments.
> 
> Regarding the handling of leftover events during dwc3 runtime resume, I 
> understand that we are not supposed to handle any leftover events. Would 

Actually, there's no leftover event.

> you be interested in making changes to the code as suggested below? The 
> reason for removing interrupt handlers from 
> dwc3_gadget_process_pending_events() is to avoid handling any leftover 
> events from the last suspend right. If so, based on my understanding, we 
> can simply remove the use of dwc3_gadget_process_pending_events() 
> instead of rework on this function since it is not necessary if we 
> remove the call to interrupt handlers from there.

That's right.

> 
> I would like to reconfirm from our end that in our failure scenario, we 
> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3 
> resume sequence is executed, and the dwc->pending_events flag is not 
> being set.
> 
> 
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index ccc3895dbd7f..63e8dd24ad0e 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -550,6 +550,15 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
> 
>          evt = dwc->ev_buf;
>          evt->lpos = 0;
> +
> +       /*
> +        * The DWC3_EVENT_PENDING flag is cleared if it has
> +        * already been set when enabling the event IRQ mask
> +        * to prevent possibly of an IRQ storm.
> +        */
> +       if (evt->flags & DWC3_EVENT_PENDING)
> +               evt->flags &= ~DWC3_EVENT_PENDING;
> +

We actually don't need to clear DWCC3_EVENT_PENDING flag if we remove
the below.

>          dwc3_writel(dwc->regs, DWC3_GEVNTADRLO(0),
> lower_32_bits(evt->dma));
>          dwc3_writel(dwc->regs, DWC3_GEVNTADRHI(0),
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 89fc690fdf34..951c805337c2 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4739,8 +4739,6 @@ int dwc3_gadget_resume(struct dwc3 *dwc)
>   void dwc3_gadget_process_pending_events(struct dwc3 *dwc)
>   {
>          if (dwc->pending_events) {
> - dwc3_interrupt(dwc->irq_gadget, dwc->ev_buf);
> - dwc3_thread_interrupt(dwc->irq_gadget, dwc->ev_buf);
>                  pm_runtime_put(dwc->dev);
>                  dwc->pending_events = false;
> enable_irq(dwc->irq_gadget);
> 
> 
> Thanks,
> Selva
> 
> >
> > On runtime suspend, the device is soft disconnected. So any interrupt
> > assertion to notify a new connection must be a custom configuration of
> > your platform. No event should be generated while the run_stop bit is
> > cleared.
> >
> > On runtime resume, we will initiate soft-reset and soft-connect to
> > restore the run_stop bit. A new connection event will be generated then.
> 
> Agree.
> 

Please try the following with some cleanup and additional comments:

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index ccc3895dbd7f..f02dae464efe 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2484,7 +2484,11 @@ static int dwc3_runtime_resume(struct device *dev)
 
 	switch (dwc->current_dr_role) {
 	case DWC3_GCTL_PRTCAP_DEVICE:
-		dwc3_gadget_process_pending_events(dwc);
+		if (dwc->pending_events) {
+			pm_runtime_put(dwc->dev);
+			dwc->pending_events = false;
+			enable_irq(dwc->irq_gadget);
+		}
 		break;
 	case DWC3_GCTL_PRTCAP_HOST:
 	default:
@@ -2574,6 +2578,14 @@ static void dwc3_complete(struct device *dev)
 static const struct dev_pm_ops dwc3_dev_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(dwc3_suspend, dwc3_resume)
 	.complete = dwc3_complete,
+
+	/*
+	 * Current implementation for runtime suspend stops the controller on
+	 * disconnection. It relies on platforms with custom connection
+	 * interrupt to notify the driver to start the controller again. This
+	 * flow is platform specific implemenation and not part of the
+	 * controller's programming flow.
+	 */
 	SET_RUNTIME_PM_OPS(dwc3_runtime_suspend, dwc3_runtime_resume,
 			dwc3_runtime_idle)
 };
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1e561fd8b86e..2fa3fd55eb32 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1673,7 +1673,6 @@ static inline void dwc3_otg_host_init(struct dwc3 *dwc)
 #if !IS_ENABLED(CONFIG_USB_DWC3_HOST)
 int dwc3_gadget_suspend(struct dwc3 *dwc);
 int dwc3_gadget_resume(struct dwc3 *dwc);
-void dwc3_gadget_process_pending_events(struct dwc3 *dwc);
 #else
 static inline int dwc3_gadget_suspend(struct dwc3 *dwc)
 {
@@ -1684,10 +1683,6 @@ static inline int dwc3_gadget_resume(struct dwc3 *dwc)
 {
 	return 0;
 }
-
-static inline void dwc3_gadget_process_pending_events(struct dwc3 *dwc)
-{
-}
 #endif /* !IS_ENABLED(CONFIG_USB_DWC3_HOST) */
 
 #if IS_ENABLED(CONFIG_USB_DWC3_ULPI)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..ea97e9c1d1bd 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4735,14 +4735,3 @@ int dwc3_gadget_resume(struct dwc3 *dwc)
 
 	return dwc3_gadget_soft_connect(dwc);
 }
-
-void dwc3_gadget_process_pending_events(struct dwc3 *dwc)
-{
-	if (dwc->pending_events) {
-		dwc3_interrupt(dwc->irq_gadget, dwc->ev_buf);
-		dwc3_thread_interrupt(dwc->irq_gadget, dwc->ev_buf);
-		pm_runtime_put(dwc->dev);
-		dwc->pending_events = false;
-		enable_irq(dwc->irq_gadget);
-	}
-}


--
BR,
Thinh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ