lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <f9561f03-5f83-4270-b7f3-17b880cfabfe@samsung.com> Date: Fri, 6 Sep 2024 04:35:38 +0530 From: Selvarasu Ganesan <selvarasu.g@...sung.com> To: Thinh Nguyen <Thinh.Nguyen@...opsys.com> Cc: "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>, "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "jh0801.jung@...sung.com" <jh0801.jung@...sung.com>, "dh10.jung@...sung.com" <dh10.jung@...sung.com>, "naushad@...sung.com" <naushad@...sung.com>, "akash.m5@...sung.com" <akash.m5@...sung.com>, "rc93.raju@...sung.com" <rc93.raju@...sung.com>, "taehyun.cho@...sung.com" <taehyun.cho@...sung.com>, "hongpooh.kim@...sung.com" <hongpooh.kim@...sung.com>, "eomji.oh@...sung.com" <eomji.oh@...sung.com>, "shijie.cai@...sung.com" <shijie.cai@...sung.com> Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm On 9/6/2024 2:43 AM, Thinh Nguyen wrote: > On Thu, Sep 05, 2024, Selvarasu Ganesan wrote: >> On 9/5/2024 5:56 AM, Thinh Nguyen wrote: >>> On Wed, Sep 04, 2024, Selvarasu Ganesan wrote: >>>> On 9/4/2024 6:33 AM, Thinh Nguyen wrote: >>>>> On Mon, Sep 02, 2024, Selvarasu Ganesan wrote: >>>>>> I would like to reconfirm from our end that in our failure scenario, we >>>>>> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3 >>>>>> resume sequence is executed, and the dwc->pending_events flag is not >>>>>> being set. >>>>>> >>>>> If the controller is stopped, no event is generated until it's restarted >>>>> again. (ie, you should not see GEVNTCOUNT updated after clearing >>>>> DCTL.run_stop). If there's no event, no interrupt assertion should come >>>>> from the controller. >>>>> >>>>> If the pending_events is not set and you still see this failure, then >>>>> likely that the controller had started, and the interrupt is generated >>>>> from the controller event. This occurs along with the interrupt >>>>> generated from your connection notification from your setup. >>>> I completely agree. My discussion revolves around the handling of the >>>> DWC3_EVENT_PENDING flag in all situations. The purpose of using this >>>> flag is to prevent the processing of new events if an existing event is >>>> still being processed. This flag is set in the top-half interrupt >>>> handler and cleared at the end of the bottom-half handler. >>>> >>>> Now, let's consider scenarios where the bottom half is not scheduled, >>>> and a USB reconnect occurs. In this case, there is a possibility that >>>> the interrupt line is unmasked in dwc3_event_buffers_setup, and the USB >>>> controller begins posting new events. The top-half interrupt handler >>>> checks for the DWC3_EVENT_PENDING flag and returns IRQ_HANDLED without >>>> processing any new events. However, the USB controller continues to post >>>> interrupts until they are acknowledged. >>>> >>>> Please review the complete sequence once with DWC3_EVENT_PENDING flag. >>>> >>>> My proposal is to clear or reset the DWC3_EVENT_PENDING flag when >>>> unmasking the interrupt line dwc3_event_buffers_setup, apart from >>>> bottom-half handler. Clearing the DWC3_EVENT_PENDING flag in >>>> dwc3_event_buffers_setup does not cause any harm, as we have implemented >>>> a temporary workaround in our test setup to prevent IRQ storms. >>>> >>>> >>>> >>>> Working scenarios: >>>> ================== >>>> 1. Top-half handler: >>>> a. if (evt->flags & DWC3_EVENT_PENDING) >>>> return IRQ_HANDLED; >>>> b. Set DWC3_EVENT_PENDING flag >>>> c. Masking interrupt line >>>> >>>> 2. Bottom-half handler: >>>> a. Un-masking interrupt line >>>> b. Clear DWC3_EVENT_PENDING flag >>>> >>>> Failure scenarios: >>>> ================== >>>> 1. Top-half handler: >>>> a. if (evt->flags & DWC3_EVENT_PENDING) >>>> return IRQ_HANDLED; >>>> b. Set DWC3_EVENT_PENDING flag >>>> c. Masking interrupt line >>> For DWC3_EVENT_PENDING flag to be set at this point (before we start the >>> controller), that means that the GEVNTCOUNT was not 0 after >>> soft-disconnect and that the pm_runtime_suspended() must be false. >> In the top-half code where we set the DWC3_EVENT_PENDING flag, we >> acknowledge GEVNTCOUNT. Therefore, I think it is not necessary for >> GEVNTCOUNT to have a non-zero value until a new event occurs. In fact, >> when we tried to print GEVNTCOUNT in a non-interrupt context, we found >> that it was zero, where we received DWC3_EVENT_PENDING being set in >> non-interrupt context. > For DWC3_EVENT_PENDING to be set, GEVNTCOUNT must be non-zero. If you > see it's zero, that means that it was already decremented by the driver. > > If the driver acknowledges the GEVNTCOUNT, then that means that the > events are copied and prepared to be processed. The bottom-half thread > is scheduled. If it's for stale event, I don't want it to be processed. > >>>> 2. No Bottom-half scheduled: >>> Why is the bottom-half not scheduled? Or do you mean it hasn't woken up >>> yet before the next top-half coming? >> In very rare cases, it is possible in our platform that the CPU may not >> be able to schedule the bottom half of the dwc3 interrupt because a work >> queue lockup has occurred on the same CPU that is attempting to schedule >> the dwc3 thread interrupt. In this case Yes, the bottom-half handler >> hasn't woken up, then initiate an IRQ storm for new events after the >> controller restarts, resulting in no more bottom-half scheduling due to >> the CPU being stuck in processing continuous interrupts and return >> IRQ_HANDLED by checking if (evt->flags & DWC3_EVENT_PENDING). >> >>>> 3. USB reconnect: dwc3_event_buffers_setup >>>> a. Un-masking interrupt line >>> Do we know that the GEVNTCOUNT is non-zero before starting the >>> controller again? >> The GEVNTCOUNT value showing as zero that we confirmed by adding debug >> message here. >>>> 4. Continuous interrupts : Top-half handler: >>>> a. if (evt->flags & DWC3_EVENT_PENDING) >>>> return IRQ_HANDLED; >>>> >>>> a. if (evt->flags & DWC3_EVENT_PENDING) >>>> return IRQ_HANDLED; >>>> >>>> a. if (evt->flags & DWC3_EVENT_PENDING) >>>> return IRQ_HANDLED; >>>> ..... >>>> >>>> ..... >>>> >>>> ..... >>>> >> Sure, I can try implementing the proposed code modifications in our >> testing environment. >> >> But, I am uncertain about how these changes will effectively prevent an >> IRQ storm when the USB controller sequence restarts with the >> DWC3_EVENT_PENDING. The following code will only execute until the >> DWC3_EVENT_PENDING is cleared, at which point the previous bottom-half >> will not be scheduled. >> >> Please correct me if i am wrong in my above understanding. > As I mentioned, I don't want DWC3_EVENT_PENDING flag to be set due to > the stale event. I want to ignore and skip processing any stale event. > > The DWC3_EVENT_PENDING should not be set by the time > dwc3_event_buffers_setup() is called. > > Specifically review this condition in my testing patch: > > /* > * If the controller is halted, the event count is stale/invalid. Ignore > * them. This happens if the interrupt assertion is from an out-of-band > * resume notification. > */ > if (!dwc->pullups_connected && count) { > dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count); > return IRQ_HANDLED; > } > > Let me know if the condition matches with what's happening for your > case. Hi Thinh, Thanks for your continuous reviews and suggestions. The given condition also will not matches in our case. As i mentioned in starting of this thread please refer once the below link of older discussion for similar issue from Samsung.. https://lore.kernel.org/linux-usb/20230102050831.105499-1-jh0801.jung@samsung.com/ DWC3_EVENT_PENDING flags set when count is 0. It means "There are no interrupts to handle.". (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 ( (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000, (void *) cache = 0xFFFFFF8839F54080, (unsigned int) length = 0x1000, (unsigned int) lpos = 0x0, *(unsigned int) count = 0x0, (unsigned int) flags = 0x00000001,* (dma_addr_t) dma = 0x00000008BD7D7000, (struct dwc3 *) dwc = 0xFFFFFF8839CBC880, (u64) android_kabi_reserved1 = 0x0), IRQ Storm: (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1), (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3), (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1), (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3), (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1), (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3), ... ... ... We are also fine with below code changes as you suggested earlier. https://lore.kernel.org/linux-usb/20230109190914.3blihjfjdcszazdd@synopsys.com/ diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 65500246323b..3c36dfdb88f0 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -5515,8 +5515,15 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) * irq event handler completes before caching new event to prevent * losing events. */ - if (evt->flags & DWC3_EVENT_PENDING) + if (evt->flags & DWC3_EVENT_PENDING) { + if (!evt->count) { + u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); + + if (!(reg & DWC3_GEVNTSIZ_INTMASK)) + evt->flags &= ~DWC3_EVENT_PENDING; + } return IRQ_HANDLED; + } Thanks, Selva > . > Thanks, > Thinh
Powered by blists - more mailing lists