lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9561f03-5f83-4270-b7f3-17b880cfabfe@samsung.com>
Date: Fri, 6 Sep 2024 04:35:38 +0530
From: Selvarasu Ganesan <selvarasu.g@...sung.com>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc: "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"jh0801.jung@...sung.com" <jh0801.jung@...sung.com>, "dh10.jung@...sung.com"
	<dh10.jung@...sung.com>, "naushad@...sung.com" <naushad@...sung.com>,
	"akash.m5@...sung.com" <akash.m5@...sung.com>, "rc93.raju@...sung.com"
	<rc93.raju@...sung.com>, "taehyun.cho@...sung.com"
	<taehyun.cho@...sung.com>, "hongpooh.kim@...sung.com"
	<hongpooh.kim@...sung.com>, "eomji.oh@...sung.com" <eomji.oh@...sung.com>,
	"shijie.cai@...sung.com" <shijie.cai@...sung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt
 storm


On 9/6/2024 2:43 AM, Thinh Nguyen wrote:
> On Thu, Sep 05, 2024, Selvarasu Ganesan wrote:
>> On 9/5/2024 5:56 AM, Thinh Nguyen wrote:
>>> On Wed, Sep 04, 2024, Selvarasu Ganesan wrote:
>>>> On 9/4/2024 6:33 AM, Thinh Nguyen wrote:
>>>>> On Mon, Sep 02, 2024, Selvarasu Ganesan wrote:
>>>>>> I would like to reconfirm from our end that in our failure scenario, we
>>>>>> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3
>>>>>> resume sequence is executed, and the dwc->pending_events flag is not
>>>>>> being set.
>>>>>>
>>>>> If the controller is stopped, no event is generated until it's restarted
>>>>> again. (ie, you should not see GEVNTCOUNT updated after clearing
>>>>> DCTL.run_stop). If there's no event, no interrupt assertion should come
>>>>> from the controller.
>>>>>
>>>>> If the pending_events is not set and you still see this failure, then
>>>>> likely that the controller had started, and the interrupt is generated
>>>>> from the controller event. This occurs along with the interrupt
>>>>> generated from your connection notification from your setup.
>>>> I completely agree. My discussion revolves around the handling of the
>>>> DWC3_EVENT_PENDING flag in all situations. The purpose of using this
>>>> flag is to prevent the processing of new events if an existing event is
>>>> still being processed. This flag is set in the top-half interrupt
>>>> handler and cleared at the end of the bottom-half handler.
>>>>
>>>> Now, let's consider scenarios where the bottom half is not scheduled,
>>>> and a USB reconnect occurs. In this case, there is a possibility that
>>>> the interrupt line is unmasked in dwc3_event_buffers_setup, and the USB
>>>> controller begins posting new events. The top-half interrupt handler
>>>> checks for the DWC3_EVENT_PENDING flag and returns IRQ_HANDLED without
>>>> processing any new events. However, the USB controller continues to post
>>>> interrupts until they are acknowledged.
>>>>
>>>> Please review the complete sequence once with DWC3_EVENT_PENDING flag.
>>>>
>>>> My proposal is to clear or reset the DWC3_EVENT_PENDING flag when
>>>> unmasking the interrupt line dwc3_event_buffers_setup, apart from
>>>> bottom-half handler. Clearing the DWC3_EVENT_PENDING flag in
>>>> dwc3_event_buffers_setup does not cause any harm, as we have implemented
>>>> a temporary workaround in our test setup to prevent IRQ storms.
>>>>
>>>>
>>>>
>>>> Working scenarios:
>>>> ==================
>>>> 1. Top-half handler:
>>>>        a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>            return IRQ_HANDLED;
>>>>        b. Set DWC3_EVENT_PENDING flag
>>>>        c. Masking interrupt line
>>>>
>>>> 2. Bottom-half handler:
>>>>        a. Un-masking interrupt line
>>>>        b. Clear DWC3_EVENT_PENDING flag
>>>>
>>>> Failure scenarios:
>>>> ==================
>>>> 1. Top-half handler:
>>>>        a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>                    return IRQ_HANDLED;
>>>>        b. Set DWC3_EVENT_PENDING flag
>>>>        c. Masking interrupt line
>>> For DWC3_EVENT_PENDING flag to be set at this point (before we start the
>>> controller), that means that the GEVNTCOUNT was not 0 after
>>> soft-disconnect and that the pm_runtime_suspended() must be false.
>> In the top-half code where we set the DWC3_EVENT_PENDING flag, we
>> acknowledge GEVNTCOUNT. Therefore, I think it is not necessary for
>> GEVNTCOUNT to have a non-zero value until a new event occurs. In fact,
>> when we tried to print GEVNTCOUNT in a non-interrupt context, we found
>> that it was zero, where we received DWC3_EVENT_PENDING being set in
>> non-interrupt context.
> For DWC3_EVENT_PENDING to be set, GEVNTCOUNT must be non-zero. If you
> see it's zero, that means that it was already decremented by the driver.
>
> If the driver acknowledges the GEVNTCOUNT, then that means that the
> events are copied and prepared to be processed. The bottom-half thread
> is scheduled. If it's for stale event, I don't want it to be processed.
>
>>>> 2. No Bottom-half scheduled:
>>> Why is the bottom-half not scheduled? Or do you mean it hasn't woken up
>>> yet before the next top-half coming?
>> In very rare cases, it is possible in our platform that the CPU may not
>> be able to schedule the bottom half of the dwc3 interrupt because a work
>> queue lockup has occurred on the same CPU that is attempting to schedule
>> the dwc3 thread interrupt. In this case Yes, the bottom-half handler
>> hasn't woken up, then initiate an IRQ storm for new events after the
>> controller restarts, resulting in no more bottom-half scheduling due to
>> the CPU being stuck in processing continuous interrupts and return
>> IRQ_HANDLED by checking if (evt->flags & DWC3_EVENT_PENDING).
>>
>>>> 3. USB reconnect: dwc3_event_buffers_setup
>>>>        a. Un-masking interrupt line
>>> Do we know that the GEVNTCOUNT is non-zero before starting the
>>> controller again?
>> The GEVNTCOUNT value showing as zero that we confirmed by adding debug
>> message here.
>>>> 4. Continuous interrupts : Top-half handler:
>>>>        a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>                    return IRQ_HANDLED;
>>>>
>>>>        a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>                    return IRQ_HANDLED;
>>>>
>>>>        a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>                    return IRQ_HANDLED;
>>>> .....
>>>>
>>>> .....
>>>>
>>>> .....
>>>>
>> Sure, I can try implementing the proposed code modifications in our
>> testing environment.
>>
>> But, I am uncertain about how these changes will effectively prevent an
>> IRQ storm when the USB controller sequence restarts with the
>> DWC3_EVENT_PENDING. The following code will only execute until the
>> DWC3_EVENT_PENDING is cleared, at which point the previous bottom-half
>> will not be scheduled.
>>
>> Please correct me if i am wrong in my above understanding.
> As I mentioned, I don't want DWC3_EVENT_PENDING flag to be set due to
> the stale event. I want to ignore and skip processing any stale event.
>
> The DWC3_EVENT_PENDING should not be set by the time
> dwc3_event_buffers_setup() is called.
>
> Specifically review this condition in my testing patch:
>
> 	/*
> 	 * If the controller is halted, the event count is stale/invalid. Ignore
> 	 * them. This happens if the interrupt assertion is from an out-of-band
> 	 * resume notification.
> 	 */
> 	if (!dwc->pullups_connected && count) {
> 		dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
> 		return IRQ_HANDLED;
> 	}
>
> Let me know if the condition matches with what's happening for your
> case.
Hi Thinh,

Thanks for your continuous reviews and suggestions.

The given condition also will not matches in our case.
As i mentioned in starting of this thread please refer once the below 
link of older discussion for similar issue from Samsung..

https://lore.kernel.org/linux-usb/20230102050831.105499-1-jh0801.jung@samsung.com/


DWC3_EVENT_PENDING flags set when count is 0.
It means "There are no interrupts to handle.".

(struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
	(void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
	(void *) cache = 0xFFFFFF8839F54080,
	(unsigned int) length = 0x1000,
	(unsigned int) lpos = 0x0,
*(unsigned int) count = 0x0, (unsigned int) flags = 0x00000001,*
	(dma_addr_t) dma = 0x00000008BD7D7000,
	(struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
	(u64) android_kabi_reserved1 = 0x0),

IRQ Storm:
(time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
...
...
...


We are also fine with below code changes as you suggested earlier.
https://lore.kernel.org/linux-usb/20230109190914.3blihjfjdcszazdd@synopsys.com/

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 65500246323b..3c36dfdb88f0 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -5515,8 +5515,15 @@ static irqreturn_t dwc3_check_event_buf(struct 
dwc3_event_buffer *evt)
          * irq event handler completes before caching new event to prevent
          * losing events.
          */
-       if (evt->flags & DWC3_EVENT_PENDING)
+       if (evt->flags & DWC3_EVENT_PENDING) {
+               if (!evt->count) {
+                       u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
+
+                       if (!(reg & DWC3_GEVNTSIZ_INTMASK))
+                               evt->flags &= ~DWC3_EVENT_PENDING;
+               }
                 return IRQ_HANDLED;
+       }


Thanks,
Selva


> .
> Thanks,
> Thinh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ