[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7efa451e-2601-f448-af33-b844091db264@quicinc.com>
Date: Tue, 20 Jun 2023 10:23:55 +0530
From: Krishna Kurapati PSSNV <quic_kriskura@...cinc.com>
To: Johan Hovold <johan@...nel.org>,
Thinh Nguyen <Thinh.Nguyen@...opsys.com>
CC: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
<linux-usb@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<quic_ppratap@...cinc.com>, <quic_wcheng@...cinc.com>,
<quic_jackp@...cinc.com>, <quic_ugoswami@...cinc.com>
Subject: Re: [PATCH v3] usb: dwc3: gadget: Propagate core init errors to UDC
during pullup
On 6/19/2023 8:36 PM, Johan Hovold wrote:
> On Mon, Jun 19, 2023 at 06:20:43PM +0530, Krishna Kurapati PSSNV wrote:
>> On 6/19/2023 12:36 PM, Johan Hovold wrote:
>>> On Sun, Jun 18, 2023 at 05:39:49PM +0530, Krishna Kurapati wrote:
>
>>>> @@ -2747,7 +2747,9 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
>>>> ret = pm_runtime_get_sync(dwc->dev);
>>>> if (!ret || ret < 0) {
>>>> pm_runtime_put(dwc->dev);
>>>> - return 0;
>>>> + if (ret < 0)
>>>> + pm_runtime_set_suspended(dwc->dev);
>>>
>>> This bit is broken and is also not mentioned or explained in the commit
>>> message. What are you trying to achieve here?
>>>
>>> You cannot set the state like this after runtime PM is enabled and the
>>> above call will always fail.
>
>> The reason why I an returning ret is because, when the first get_sync
>> fails because of core_init failure and we return 0 instead of ret, the
>> UDC thinks that controller has started successfully but we never set the
>> run stop bit.
>
> That bit is clear.
>
>> So when we plug out the cable, the disconnect event won't
>> be generated and we never send on systems like android the user space
>> will never clear the UDC upon disconnect. Its a sort of mismatch between
>> controller and udc.
>
> Ok, but the controller is an error state after the resume failure. And
> here you rely on user space to retry gadget activation in order to
> eventually detect the disconnect event?
>
>> Also once the first get_sync fails, the dwc->dev->power.runtime_error
>> flag is set and successive calls to get_sync always return -EINVAL. In
>> this situation even if UDC/configfs retry pullup, resume_common will
>> never be called and we never actually start the controller or resume
>> dwc->dev.
>>
>> By calling set_suspended, I am trying to clear the runtime_error flag so
>> that the next retry to pullup will call resume_common and retry
>> core_init and set run_stop.
>
> Ok, thanks, that's the bit I was missing in the commit message.
>
> First, I perhaps mistakingly thought pm_runtime_set_suspended() may only
> be called with PM runtime disabled, but it appears it may indeed be
> valid to call also after an error but with the caveat that the device
> must then actually be in the suspended state.
>
> The documentation and implementation is inconsistent here as the kernel
> doc for pm_runtime_set_suspended() clearly states:
>
> It is not valid to call this function for devices with runtime
> PM enabled.
>
> and it also looks like we'd end up with an active-child counter
> imbalance if anyone actually tries to do so.
>
> But either way, it also seems like the controller is not guaranteed to
> be suspended here as pm_runtime_get_sync() may also fail after a
> previous errors that have left the controller in the active state?
> > Also, what kind of errors would cause core_init and resume to fail here?
>
Hi Johan,
As per the comment just above the get_sync during pullup, the
resume_common path is used to resume the controller and start peripheral
mode incase the dwc3 was in suspended state. So if we are entering
gadget_resume we are in suspended state the first time it is called.
Regarding the errors that might leave controller in active state, I have
faced issue in core init, and in that function there is a cleanup
happening in case something fails. So controller was actually not in
active state after cleanup was done. In resume common, if
core_init_for_resume is failing, we cleanup everything initialized up
until that point.
The scenario you mentioned would be applicable in case gadget_resume
fails. We are not having a return value check and I am not sure what
would be the side effect of not having that check there. Either ways,
since it was failing for core init, I went ahead and made this patch.
As per the reason for failure in core init, the following is what was
happening at customer's end:
1. Cable plug-in
2. get_sync calls resume common which inturn calls core_init
3. core soft reset fails in core init, we cleanup and return -110
4. After applying this patch, the -110 was propagated to UDC properly
5. We got a second call to pullup via connect_control and this time
reset was successful.
The behavior was similar to [1]. There as well, on all Gen-2 targets,
after the retry happens I see soft reset is passing but failing for
first attempt.
[1]:
https://lore.kernel.org/all/20230510075252.31023-2-quic_kriskura@quicinc.com/
Regards,
Krishna,
> If this is something that you see during normal operation then this
> seems to suggest that something is wrong with the runtime pm
> implementation.
>
> Note that virtually all drivers treat resume failures as fatal errors
> and do not implement any recovery from that.
>
> In fact, the only other example of this kind of usage that I could find
> is also for a Qualcomm driver...
>
> Johan
Powered by blists - more mailing lists