lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240905211338.omst6jr3okbxkqdh@synopsys.com>
Date: Thu, 5 Sep 2024 21:13:48 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Selvarasu Ganesan <selvarasu.g@...sung.com>
CC: Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jh0801.jung@...sung.com" <jh0801.jung@...sung.com>,
        "dh10.jung@...sung.com" <dh10.jung@...sung.com>,
        "naushad@...sung.com" <naushad@...sung.com>,
        "akash.m5@...sung.com" <akash.m5@...sung.com>,
        "rc93.raju@...sung.com" <rc93.raju@...sung.com>,
        "taehyun.cho@...sung.com" <taehyun.cho@...sung.com>,
        "hongpooh.kim@...sung.com" <hongpooh.kim@...sung.com>,
        "eomji.oh@...sung.com" <eomji.oh@...sung.com>,
        "shijie.cai@...sung.com" <shijie.cai@...sung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm

On Thu, Sep 05, 2024, Selvarasu Ganesan wrote:
> 
> On 9/5/2024 5:56 AM, Thinh Nguyen wrote:
> > On Wed, Sep 04, 2024, Selvarasu Ganesan wrote:
> >> On 9/4/2024 6:33 AM, Thinh Nguyen wrote:
> >>> On Mon, Sep 02, 2024, Selvarasu Ganesan wrote:
> >>>> I would like to reconfirm from our end that in our failure scenario, we
> >>>> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3
> >>>> resume sequence is executed, and the dwc->pending_events flag is not
> >>>> being set.
> >>>>
> >>> If the controller is stopped, no event is generated until it's restarted
> >>> again. (ie, you should not see GEVNTCOUNT updated after clearing
> >>> DCTL.run_stop). If there's no event, no interrupt assertion should come
> >>> from the controller.
> >>>
> >>> If the pending_events is not set and you still see this failure, then
> >>> likely that the controller had started, and the interrupt is generated
> >>> from the controller event. This occurs along with the interrupt
> >>> generated from your connection notification from your setup.
> >>
> >> I completely agree. My discussion revolves around the handling of the
> >> DWC3_EVENT_PENDING flag in all situations. The purpose of using this
> >> flag is to prevent the processing of new events if an existing event is
> >> still being processed. This flag is set in the top-half interrupt
> >> handler and cleared at the end of the bottom-half handler.
> >>
> >> Now, let's consider scenarios where the bottom half is not scheduled,
> >> and a USB reconnect occurs. In this case, there is a possibility that
> >> the interrupt line is unmasked in dwc3_event_buffers_setup, and the USB
> >> controller begins posting new events. The top-half interrupt handler
> >> checks for the DWC3_EVENT_PENDING flag and returns IRQ_HANDLED without
> >> processing any new events. However, the USB controller continues to post
> >> interrupts until they are acknowledged.
> >>
> >> Please review the complete sequence once with DWC3_EVENT_PENDING flag.
> >>
> >> My proposal is to clear or reset the DWC3_EVENT_PENDING flag when
> >> unmasking the interrupt line dwc3_event_buffers_setup, apart from
> >> bottom-half handler. Clearing the DWC3_EVENT_PENDING flag in
> >> dwc3_event_buffers_setup does not cause any harm, as we have implemented
> >> a temporary workaround in our test setup to prevent IRQ storms.
> >>
> >>
> >>
> >> Working scenarios:
> >> ==================
> >> 1. Top-half handler:
> >>       a. if (evt->flags & DWC3_EVENT_PENDING)
> >>           return IRQ_HANDLED;
> >>       b. Set DWC3_EVENT_PENDING flag
> >>       c. Masking interrupt line
> >>
> >> 2. Bottom-half handler:
> >>       a. Un-masking interrupt line
> >>       b. Clear DWC3_EVENT_PENDING flag
> >>
> >> Failure scenarios:
> >> ==================
> >> 1. Top-half handler:
> >>       a. if (evt->flags & DWC3_EVENT_PENDING)
> >>                   return IRQ_HANDLED;
> >>       b. Set DWC3_EVENT_PENDING flag
> >>       c. Masking interrupt line
> > For DWC3_EVENT_PENDING flag to be set at this point (before we start the
> > controller), that means that the GEVNTCOUNT was not 0 after
> > soft-disconnect and that the pm_runtime_suspended() must be false.
> 
> In the top-half code where we set the DWC3_EVENT_PENDING flag, we 
> acknowledge GEVNTCOUNT. Therefore, I think it is not necessary for 
> GEVNTCOUNT to have a non-zero value until a new event occurs. In fact, 
> when we tried to print GEVNTCOUNT in a non-interrupt context, we found 
> that it was zero, where we received DWC3_EVENT_PENDING being set in 
> non-interrupt context.

For DWC3_EVENT_PENDING to be set, GEVNTCOUNT must be non-zero. If you
see it's zero, that means that it was already decremented by the driver.

If the driver acknowledges the GEVNTCOUNT, then that means that the
events are copied and prepared to be processed. The bottom-half thread
is scheduled. If it's for stale event, I don't want it to be processed.

> 
> >
> >> 2. No Bottom-half scheduled:
> > Why is the bottom-half not scheduled? Or do you mean it hasn't woken up
> > yet before the next top-half coming?
> 
> In very rare cases, it is possible in our platform that the CPU may not 
> be able to schedule the bottom half of the dwc3 interrupt because a work 
> queue lockup has occurred on the same CPU that is attempting to schedule 
> the dwc3 thread interrupt. In this case Yes, the bottom-half handler 
> hasn't woken up, then initiate an IRQ storm for new events after the 
> controller restarts, resulting in no more bottom-half scheduling due to 
> the CPU being stuck in processing continuous interrupts and return 
> IRQ_HANDLED by checking if (evt->flags & DWC3_EVENT_PENDING).
> 
> >
> >> 3. USB reconnect: dwc3_event_buffers_setup
> >>       a. Un-masking interrupt line
> > Do we know that the GEVNTCOUNT is non-zero before starting the
> > controller again?
> 
> The GEVNTCOUNT value showing as zero that we confirmed by adding debug 
> message here.
> >
> >> 4. Continuous interrupts : Top-half handler:
> >>       a. if (evt->flags & DWC3_EVENT_PENDING)
> >>                   return IRQ_HANDLED;
> >>
> >>       a. if (evt->flags & DWC3_EVENT_PENDING)
> >>                   return IRQ_HANDLED;
> >>
> >>       a. if (evt->flags & DWC3_EVENT_PENDING)
> >>                   return IRQ_HANDLED;
> >> .....
> >>
> >> .....
> >>
> >> .....
> >>
> 
> Sure, I can try implementing the proposed code modifications in our 
> testing environment.
> 
> But, I am uncertain about how these changes will effectively prevent an 
> IRQ storm when the USB controller sequence restarts with the 
> DWC3_EVENT_PENDING. The following code will only execute until the 
> DWC3_EVENT_PENDING is cleared, at which point the previous bottom-half 
> will not be scheduled.
> 
> Please correct me if i am wrong in my above understanding.

As I mentioned, I don't want DWC3_EVENT_PENDING flag to be set due to
the stale event. I want to ignore and skip processing any stale event.

The DWC3_EVENT_PENDING should not be set by the time
dwc3_event_buffers_setup() is called.

Specifically review this condition in my testing patch:

	/*
	 * If the controller is halted, the event count is stale/invalid. Ignore
	 * them. This happens if the interrupt assertion is from an out-of-band
	 * resume notification.
	 */
	if (!dwc->pullups_connected && count) {
		dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
		return IRQ_HANDLED;
	}

Let me know if the condition matches with what's happening for your
case.

Thanks,
Thinh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ