[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240911002408.gr4fv5vkst7ukxd5@synopsys.com>
Date: Wed, 11 Sep 2024 00:24:10 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Selvarasu Ganesan <selvarasu.g@...sung.com>
CC: Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jh0801.jung@...sung.com" <jh0801.jung@...sung.com>,
"dh10.jung@...sung.com" <dh10.jung@...sung.com>,
"naushad@...sung.com" <naushad@...sung.com>,
"akash.m5@...sung.com" <akash.m5@...sung.com>,
"rc93.raju@...sung.com" <rc93.raju@...sung.com>,
"taehyun.cho@...sung.com" <taehyun.cho@...sung.com>,
"hongpooh.kim@...sung.com" <hongpooh.kim@...sung.com>,
"eomji.oh@...sung.com" <eomji.oh@...sung.com>,
"shijie.cai@...sung.com" <shijie.cai@...sung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm
On Tue, Sep 10, 2024, Selvarasu Ganesan wrote:
>
> On 9/7/2024 6:09 AM, Thinh Nguyen wrote:
> > On Sat, Sep 07, 2024, Selvarasu Ganesan wrote:
> >> Hi Thinh,
> >>
> >> I ran the code you recommended on our testing environment and was able
> >> to reproduce the issue one time.
> >>
> >> When evt->flags contains DWC3_EVENT_PENDING, I've included the following
> >> debugging information: I added this debug message at the start of
> >> dwc3_event_buffers_cleanup and dwc3_event_buffers_setup functions in
> >> during suspend and resume.
> >>
> >> The results were quite interesting . I'm curious to understand how
> >> evt->flags is set to DWC3_EVENT_PENDING, and along with DWC3_GEVNTSIZ is
> >> equal to 0x1000 during the suspend.
> > That is indeed strange.
> >
> >> Its means that the previous bottom-half handler prior to suspend might
> >> still be executing in the middle of the process.
> >>
> >> Could you please give your suggestions here? And let me know if anything
> >> want to test or additional details are required.
> >>
> >>
> >> ##DBG: dwc3_event_buffers_cleanup:
> >> evt->length :0x1000
> >> evt->lpos :0x20c
> >> evt->count :0x0
> >> evt->flags :0x1 // This is Unexpected if DWC3_GEVNTSIZ(0)(0xc408):
> >> 0x00001000. Its means that previous bottom-half handler may be still
> >> running in middle
> > Perhaps.
> >
> > But I doubt that's the case since it shouldn't take that long for the
> > bottom-half to be completed before the next resume yet the flag is still
> > set.
> >
> >> DWC3_GEVNTSIZ(0)(0xc408) : 0x00001000
> >> DWC3_GEVNTCOUNT(0)(0xc40c) : 0x00000000
> >> DWC3_DCFG(0xc700) : 0x00e008a8
> >> DWC3_DCTL(0xc704) : 0x0cf00a00
> >> DWC3_DEVTEN(0xc708) : 0x00000000
> >> DWC3_DSTS(0xc70c) : 0x00d20cd1
> >>
> > The controller status is halted. So there's no problem with
> > soft-disconnect. For the interrupt mask in GEVNTSIZ to be cleared,
> > that likely means that the bottom-half had probably completed.
>
> Agree, But I am worrying on If the bottom-half is completed, then
> DWC3_EVENT_PENDING must be cleared in evt->flags.
> Is there any possibility of a CPU reordering issue when updating
> evt->flags in the bottom-half handler?.
> Should I try with wmb() when writing to evt->flags?
Assuming that the problem occurs after the bottom-half completed, there
should be implicit memory barrier. The memory operation should complete
before the release from spin_unlock complete. I don't think wmb() will
help.
> >
> >> ##DBG: dwc3_event_buffers_setup:
> >> evt->length :0x1000
> >> evt->lpos :0x20c
> > They fact that evt->lpos did not get updated tells me that there's
> > something wrong with memory access to your platform during suspend and
> > resume.
>
> Are you expecting the evt->lpos value to be zero here? If so, this is
> expected in our test setup because we avoid writing zero to evt->lpos as
> part of dwc3_event_buffers_cleanup if evt->flags have a value of 1. This
Oh ok. I did not know you made this modification.
> is simply to track the status of evt->lpos during suspend to resume when
> evt->flags have a value of DWC3_EVENT_PENDING. The following test codes
> for the reference.
>
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -505,8 +505,20 @@ static int dwc3_alloc_event_buffers(struct dwc3
> *dwc, unsigned int length)
> int dwc3_event_buffers_setup(struct dwc3 *dwc)
> {
> struct dwc3_event_buffer *evt;
> + u32 reg;
>
> evt = dwc->ev_buf;
> +
> + if (evt->flags & DWC3_EVENT_PENDING) {
> + pr_info("evt->length :%x\n", evt->length);
> + pr_info("evt->lpos :%x\n", evt->lpos);
> + pr_info("evt->count :%x\n", evt->count);
> + pr_info("evt->flags :%x\n", evt->flags);
> +
> + dwc3_exynos_reg_dump(dwc);
> +
> + }
> +
> evt->lpos = 0;
> dwc3_writel(dwc->regs, DWC3_GEVNTADRLO(0),
> lower_32_bits(evt->dma));
> @@ -514,8 +526,10 @@ int dwc3_event_buffers_setup(struct dwc3 *dwc)
> upper_32_bits(evt->dma));
> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0),
> DWC3_GEVNTSIZ_SIZE(evt->length));
> - dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
>
> + /* Clear any stale event */
> + reg = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> + dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), reg);
> return 0;
> }
>
> @@ -525,7 +539,16 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc)
>
> evt = dwc->ev_buf;
>
> - evt->lpos = 0;
> + if (evt->flags & DWC3_EVENT_PENDING) {
> + pr_info("evt->length :%x\n", evt->length);
> + pr_info("evt->lpos :%x\n", evt->lpos);
> + pr_info("evt->count :%x\n", evt->count);
> + pr_info("evt->flags :%x\n", evt->flags);
> +
> + dwc3_exynos_reg_dump(dwc);
> + } else {
> + evt->lpos = 0;
I wasn't aware of this change.
> + }
>
> >
> >> evt->count :0x0
> >> evt->flags :0x1 // Still It's not clearing in during resume.
> >>
> >> DWC3_GEVNTSIZ(0)(0xc408) : 0x00000000
> >> DWC3_GEVNTCOUNT(0)(0xc40c) : 0x00000000
> >> DWC3_DCFG(0xc700) : 0x00080800
> >> DWC3_DCTL(0xc704) : 0x00f00000
> >> DWC3_DEVTEN(0xc708) : 0x00000000
> >> DWC3_DSTS(0xc70c) : 0x00d20001
> >>
> > Please help look into your platform to see what condition triggers this
> > memory access issue. If this is a hardware quirk, we can properly update
> > the change and note it to be so.
>
> Sure I will try to figure it out. However, we are facing challenges in
> reproducing the issue. There could be a delay in understanding the
> conditions that trigger the memory issue if it is related to a memory issue.
>
> >
> > Thanks,
> > Thinh
> >
> > (If possible, for future tests, please dump the dwc3 tracepoints. Many
> > thanks for the tests.)
>
> I tried to get dwc3 traces in the failure case, but so far no instances
> have been reported. Our testing is still in progress with enable dwc3
> traces.
>
> I will keep posting once I get the dwc3 traces in the failure scenario.
>
Thanks for the update. I hope enabling of the driver tracepoints will
not impact the reproduction of the issue. With the driver log, we'll get
more clues to what was going on.
Thanks,
Thinh
Powered by blists - more mailing lists