[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 10 Oct 2023 15:28:54 +0800
From: wuliangfeng <william.wu@...k-chips.com>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc: "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"frank.wang@...k-chips.com" <frank.wang@...k-chips.com>,
"jianwei.zheng@...k-chips.com" <jianwei.zheng@...k-chips.com>,
"yangbin@...k-chips.com" <yangbin@...k-chips.com>
Subject: Re: [PATCH v2] usb: dwc3: core: Avoid resume dwc3 if already
suspended in pm resume
Hi Thinh,
Sorry for my late reply because the national day holiday.
在 2023/10/3 2:18, Thinh Nguyen 写道:
> On Fri, Sep 15, 2023, wuliangfeng wrote:
>> Hi Thinh,
>>
>> On 2023/9/12 8:08, Thinh Nguyen wrote:
>>> Hi,
>>>
>>> On Mon, Sep 11, 2023, William Wu wrote:
>>>> If we enable PM runtime auto suspend for dwc3 on rockchip
>>>> platforms (e.g. RK3562), it allows the dwc3 controller to
>>>> enter runtime suspend if usb cable detached and power off
>>>> the power domain of the controller. When system resume, if
>>>> the dwc3 already in runtime suspended, it Shouldn't access
>>>> the dwc3 registers in dwc3_resume() because its power domain
>>>> maybe power off.
>>>>
>>>> Test on RK3562 tablet, this patch can help to avoid kernel
>>>> panic when accessing the dwc3 registers in dwc3_resume() if
>>>> the dwc3 is in runtime suspended and it's power domain is
>>>> power off.
>>> The controller should be woken up before this step. Can you provide more
>>> detail on what led to this?
>> Yes, the power domain of the usb controller will be enabled by the framework
>> of the pm generic domain before dwc3 resume if the system enter suspend and
>> exit suspend normally. However, in my test case,if the system fail to enter
>> suspend because of some devices's problem, and then goto recovery process,
>> the power domain of the usb controller will not be enable before dwc3
>> resume.
> Ok.
>
>>> e.g. some questions:
>>> Who handles the waking up of the controller? Is it the phy driver? Is
>>> the phy driver not detecting a resume? Or did the resume fail? Does this
>>> occur consistently?
>>>
>>> Thanks,
>>> Thinh
>> This issue occurs occasionally on RK3562 EVB with Type-C USB, and enable
>> autosuspend for dwc3 controller.
>>
>> Here is the test steps:
>>
>> 1. Power on the RK3562 EVB and the Type-C USB interface is in unconnected
>> state.
>>
>> 2. Makesure the dwc3 controller enter runtime suspend, and its power domain
>> is disabled.
>>
>> 3. Do system suspend/resume stress test.
>>
>> 4. The issue occurs occasionally with the following log:
>>
>> [ 251.681091][ T4331] PM: suspend entry (deep)
>> [ 251.778975][ T4331] Filesystems sync: 0.097 seconds
>> [ 251.779025][ T4331] Freezing user space processes ... (elapsed 0.005
>> seconds) done.
>> [ 251.784819][ T4331] OOM killer disabled.
>> [ 251.784851][ T4331] Freezing remaining freezable tasks ... (elapsed 0.004
>> seconds) done.
>> [ 251.792719][ T503] [SKWIFI DBG] skw_suspend: WoW: enabled, skw flags:
>> 0x302
>> [ 251.803701][ T4331] PM: dpm_run_callback():
>> platform_pm_suspend.cfi_jt+0x0/0x8 returns -16
>> [ 251.803779][ T75] PM: PM: Pending Wakeup Sources: alarmtimer.0.auto
>> [ 251.803789][ T4331] PM: Device alarmtimer.0.a
>> [ 251.803928][ T4331] PM: Some devices failed to suspend, or early wake
>> event detected
>> [ 251.804141][ T75] [SKWIFI DBG] skw_resume: skw flags: 0x300
>> [ 251.804715][ C2] SError Interrupt on CPU2, code 0xbf000000 -- SError
>> [ 251.804725][ C2] CPU: 2 PID: 4331 Comm: binder:251_4 Tainted: G
>> WC E 5.10.157-android13-4-00006-g73f337804fbc-ab9881769 #1
>> [ 251.804732][ C2] Hardware name: Rockchip RK3562 RK817 TABLET LP4 Board
>> (DT)
>> [ 251.804738][ C2] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
>> [ 251.804743][ C2] pc : el1_abort+0x40/0x68
>> [ 251.804748][ C2] lr : el1_abort+0x28/0x68
>>
>> ......
>>
>> [ 251.804965][ C2] Kernel panic - not syncing: Asynchronous SError
>> Interrupt
>> [ 251.804974][ C2] CPU: 2 PID: 4331 Comm: binder:251_4 Tainted: G
>> WC E 5.10.157-android13-4-00006-g73f337804fbc-ab9881769 #1
>> [ 251.804980][ C2] Hardware name: Rockchip RK3562 RK817 TABLET LP4 Board
>> (DT)
>> [ 251.804984][ C2] Call trace:
>> [ 251.804990][ C2] dump_backtrace.cfi_jt+0x0/0x8
>> [ 251.804995][ C2] dump_stack_lvl+0xc0/0x13c
>> [ 251.805000][ C2] panic+0x174/0x468
>> [ 251.805006][ C2] arm64_serror_panic+0x1b0/0x200
>> [ 251.805010][ C2] do_serror+0x184/0x1e4
>> [ 251.805016][ C2] el1_error+0x94/0x118
>> [ 251.805020][ C2] el1_abort+0x40/0x68
>> [ 251.805026][ C2] el1_sync_handler+0x58/0x88
>> [ 251.805031][ C2] el1_sync+0x8c/0x140
>> [ 251.805035][ C2] dwc3_readl+0x30/0x1a0
>> [ 251.805040][ C2] dwc3_phy_setup+0x38/0x510
>> [ 251.805045][ C2] dwc3_core_init+0x68/0xcd4
>> [ 251.805051][ C2] dwc3_core_init_for_resume+0x10c/0x25c
>> [ 251.805056][ C2] dwc3_resume_common+0x44/0x3d0
>> [ 251.805061][ C2] dwc3_resume+0x5c/0xb8
>> [ 251.805067][ C2] dpm_run_callback+0x70/0x488
>> [ 251.805071][ C2] device_resume+0x250/0x2f8
>> [ 251.805077][ C2] dpm_resume+0x258/0x9dc
>> [ 251.805082][ C2] suspend_devices_and_enter+0x850/0xcac
>>
>> In this case, during suspend process, because the device alarmtimer failed
>> to suspend, it break the system suspend in the funciton
>> suspend_devices_and_enter(), and goto platform_recover() directly without
>> enable the power domain of the controller, then trigger the Kernel panic in
>> dwc3_resume().
>>
> Thanks for the details.
>
>> For a comparison, in the normal case, if the system enter suspend normally,
>> and after the system wakeup, the power domain of the controller will be
>> enable by the framework of the pm generic domain before dwc3 resume.
>>
>> The function call stack like this:
>>
>> suspend_devices_and_enter -->
>>
>> suspend_enter -->
>>
>> dpm_resume_noirq --> dpm_noirq_resume_devices -->
>> device_resume_noirq --> genpd_resume_noirq --> rockchip_pd_power (enable
>> the power domain of the controller)
>>
>> dpm_resume_end -->
>>
>> dpm_resume --> device_resume --> dpm_run_callback --> dwc3_resume
>> (access the controller safely)
>>
>> dpm_complete --> genpd_complete --> genpd_queue_power_off_work
>>
>> suspend_finish --> suspend_thaw_processes --> genpd_power_off_work_fn -->
>> (diable the power domain of the controller to maintain the original runtime
>> suspend state)
>>
> At what step do we restore the power domain when this happen? Looks like
> there's a missing step in the suspend failure recovery to recover the
> power domain. What we're doing here seems more like a workaround to
> that, which unfortunately makes the code logic looks unclear IMO.
When then suspend successful, the power framework call
dpm_resume_noirq() to enable on the power domain and then call
dpm_complete() -> genpd_power_off_work_fn() to restore the original
power domain state.
When the suspend failure happens,it seems that nobody recover the power
domain in the power framework and rockchip platform pm_domains driver.
>
> Can this be fixed in the lower layer?
For the time being, I haven't found a solution in the lower layer to fix
this issue. I have to disable the dwc3 runtime feature on RK3562 GKI
project.
>
> Thanks,
> Thinh
Powered by blists - more mailing lists