lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dff83c7d-56b8-481f-af69-8d4262bd54e4@samsung.com>
Date: Tue, 10 Sep 2024 19:07:28 +0530
From: Selvarasu Ganesan <selvarasu.g@...sung.com>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc: "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"jh0801.jung@...sung.com" <jh0801.jung@...sung.com>, "dh10.jung@...sung.com"
	<dh10.jung@...sung.com>, "naushad@...sung.com" <naushad@...sung.com>,
	"akash.m5@...sung.com" <akash.m5@...sung.com>, "rc93.raju@...sung.com"
	<rc93.raju@...sung.com>, "taehyun.cho@...sung.com"
	<taehyun.cho@...sung.com>, "hongpooh.kim@...sung.com"
	<hongpooh.kim@...sung.com>, "eomji.oh@...sung.com" <eomji.oh@...sung.com>,
	"shijie.cai@...sung.com" <shijie.cai@...sung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt
 storm


On 9/7/2024 6:09 AM, Thinh Nguyen wrote:
> On Sat, Sep 07, 2024, Selvarasu Ganesan wrote:
>> Hi Thinh,
>>
>> I ran the code you recommended on our testing environment and was able
>> to reproduce the issue one time.
>>
>> When evt->flags contains DWC3_EVENT_PENDING, I've included the following
>> debugging information: I added this debug message at the start of
>> dwc3_event_buffers_cleanup and dwc3_event_buffers_setup functions in
>> during suspend and resume.
>>
>> The results were quite interesting . I'm curious to understand how
>> evt->flags is set to DWC3_EVENT_PENDING, and along with DWC3_GEVNTSIZ is
>> equal to 0x1000 during the suspend.
> That is indeed strange.
>
>> Its means that the previous bottom-half handler prior to suspend might
>> still be executing in the middle of the process.
>>
>> Could you please give your suggestions here? And let me know if anything
>> want to test or additional details are required.
>>
>>
>> ##DBG: dwc3_event_buffers_cleanup:
>>    evt->length    :0x1000
>>    evt->lpos      :0x20c
>>    evt->count     :0x0
>>    evt->flags     :0x1 // This is Unexpected if DWC3_GEVNTSIZ(0)(0xc408):
>> 0x00001000. Its means that previous bottom-half handler may be still
>> running in middle
> Perhaps.
>
> But I doubt that's the case since it shouldn't take that long for the
> bottom-half to be completed before the next resume yet the flag is still
> set.
>
>>    DWC3_GEVNTSIZ(0)(0xc408)       : 0x00001000
>>    DWC3_GEVNTCOUNT(0)(0xc40c)     : 0x00000000
>>    DWC3_DCFG(0xc700)              : 0x00e008a8
>>    DWC3_DCTL(0xc704)              : 0x0cf00a00
>>    DWC3_DEVTEN(0xc708)            : 0x00000000
>>    DWC3_DSTS(0xc70c)              : 0x00d20cd1
>>
> The controller status is halted. So there's no problem with
> soft-disconnect. For the interrupt mask in GEVNTSIZ to be cleared,
> that likely means that the bottom-half had probably completed.

Agree, But I am worrying on If the bottom-half is completed, then 
DWC3_EVENT_PENDING must be cleared in evt->flags.
Is there any possibility of a CPU reordering issue when updating 
evt->flags in the bottom-half handler?.
Should I try with wmb() when writing to evt->flags?
>
>> ##DBG: dwc3_event_buffers_setup:
>>    evt->length    :0x1000
>>    evt->lpos      :0x20c
> They fact that evt->lpos did not get updated tells me that there's
> something wrong with memory access to your platform during suspend and
> resume.

Are you expecting the evt->lpos value to be zero here? If so, this is 
expected in our test setup because we avoid writing zero to evt->lpos as 
part of dwc3_event_buffers_cleanup if evt->flags have a value of 1. This 
is simply to track the status of evt->lpos during suspend to resume when 
evt->flags have a value of DWC3_EVENT_PENDING. The following test codes 
for the reference.

--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -505,8 +505,20 @@ static int dwc3_alloc_event_buffers(struct dwc3 
*dwc, unsigned int length)
  int dwc3_event_buffers_setup(struct dwc3 *dwc)
  {
         struct dwc3_event_buffer        *evt;
+       u32                             reg;

         evt = dwc->ev_buf;
+
+       if (evt->flags & DWC3_EVENT_PENDING) {
+               pr_info("evt->length :%x\n", evt->length);
+               pr_info("evt->lpos :%x\n", evt->lpos);
+               pr_info("evt->count :%x\n", evt->count);
+               pr_info("evt->flags :%x\n", evt->flags);
+
+               dwc3_exynos_reg_dump(dwc);
+
+       }
+
         evt->lpos = 0;
         dwc3_writel(dwc->regs, DWC3_GEVNTADRLO(0),
                         lower_32_bits(evt->dma));
@@ -514,8 +526,10 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
                         upper_32_bits(evt->dma));
         dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
                         DWC3_GEVNTSIZ_SIZE(evt->length));
-       dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);

+       /* Clear any stale event */
+       reg = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
+       dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), reg);
         return 0;
  }

@@ -525,7 +539,16 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc)

         evt = dwc->ev_buf;

-       evt->lpos = 0;
+       if (evt->flags & DWC3_EVENT_PENDING) {
+               pr_info("evt->length :%x\n", evt->length);
+               pr_info("evt->lpos :%x\n", evt->lpos);
+               pr_info("evt->count :%x\n", evt->count);
+               pr_info("evt->flags :%x\n", evt->flags);
+
+               dwc3_exynos_reg_dump(dwc);
+       } else {
+               evt->lpos = 0;
+       }

>
>>    evt->count     :0x0
>>    evt->flags     :0x1 // Still It's not clearing in during resume.
>>
>>    DWC3_GEVNTSIZ(0)(0xc408)       : 0x00000000
>>    DWC3_GEVNTCOUNT(0)(0xc40c)     : 0x00000000
>>    DWC3_DCFG(0xc700)              : 0x00080800
>>    DWC3_DCTL(0xc704)              : 0x00f00000
>>    DWC3_DEVTEN(0xc708)            : 0x00000000
>>    DWC3_DSTS(0xc70c)              : 0x00d20001
>>
> Please help look into your platform to see what condition triggers this
> memory access issue. If this is a hardware quirk, we can properly update
> the change and note it to be so.

Sure I will try to figure it out. However, we are facing challenges in 
reproducing the issue. There could be a delay in understanding the 
conditions that trigger the memory issue if it is related to a memory issue.

>
> Thanks,
> Thinh
>
> (If possible, for future tests, please dump the dwc3 tracepoints. Many
> thanks for the tests.)

I tried to get dwc3 traces in the failure case, but so far no instances 
have been reported. Our testing is still in progress with enable dwc3 
traces.

I will keep posting once I get the dwc3 traces in the failure scenario.


Thanks,
Selva

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ