[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <281aae0d-6146-bf65-cf23-2ddf7e16ae1c@quicinc.com>
Date: Tue, 23 Aug 2022 15:06:17 -0700
From: Elson Serrao <quic_eserrao@...cinc.com>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
CC: "balbi@...nel.org" <balbi@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"quic_wcheng@...cinc.com" <quic_wcheng@...cinc.com>,
"quic_jackp@...cinc.com" <quic_jackp@...cinc.com>,
"quic_mrana@...cinc.com" <quic_mrana@...cinc.com>
Subject: Re: [PATCH 2/5] usb: gadget: Add function wakeup support
On 8/22/2022 6:01 PM, Thinh Nguyen wrote:
> On Thu, Aug 18, 2022 at 11:17:24AM -0700, Elson Serrao wrote:
>>
>>
>> On 8/16/2022 4:51 PM, Thinh Nguyen wrote:
>>> On 8/16/2022, Elson Serrao wrote:
>>>>
>>>>
>>>> On 8/12/2022 5:46 PM, Thinh Nguyen wrote:
>>>>> On 8/11/2022, Thinh Nguyen wrote:
>>>>>> On 8/11/2022, Thinh Nguyen wrote:
>>>>>>> On 8/11/2022, Elson Serrao wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/9/2022 6:08 PM, Thinh Nguyen wrote:
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>
>>>>>>>>> To summarize the points:
>>>>>>>>>
>>>>>>>>> 1) The host only arms function remote wakeup if the device is
>>>>>>>>> capable of
>>>>>>>>> remote wakeup (check USB_CONFIG_ATT_WAKEUP in bmAttributes and
>>>>>>>>> hardware
>>>>>>>>> capability)
>>>>>>>>>
>>>>>>>>> 2) If the device is in suspend, the device can do remote wakeup
>>>>>>>>> (through
>>>>>>>>> LFPS handshake) if the function is armed for remote wakeup (through
>>>>>>>>> SET_FEATURE(FUNC_SUSPEND)).
>>>>>>>>>
>>>>>>>>> 3) If the link transitions to U0 after the device triggering a remote
>>>>>>>>> wakeup, the device will also send device notification function
>>>>>>>>> wake for
>>>>>>>>> all the interfaces armed with remote wakeup.
>>>>>>>>>
>>>>>>>>> 4) If the device is not in suspend, the device can send device
>>>>>>>>> notification function wake if it's in U0.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now, remote wakeup and function wake device notification are 2
>>>>>>>>> separate
>>>>>>>>> operations. We have the usb_gadget_ops->wakeup() for remote wakeup. I
>>>>>>>>> suggested to maybe add
>>>>>>>>> usb_gadget_ops->send_wakeup_notification(gadget,
>>>>>>>>> intf_id) for the device notification. What you did was combining both
>>>>>>>>> operations in usb_gadget_ops->func_wakeup(). That may only work for
>>>>>>>>> point 4) (assuming you fix the U0 check), but not point 3).
>>>>>>>>
>>>>>>>> Thank you for your feedback and summary. I will rename func_wakeup to
>>>>>>>> send_wakeup_notification to better align with the approach. The
>>>>>>>> reason I
>>>>>>>> have combined remote_wakeup and function wake notification in
>>>>>>>> usb_gadget_ops->func_wakeup() is because since the implementation
>>>>>>>> is at
>>>>>>>> function/composite level it has no knowledge on the link state. So I
>>>>>>>> have delegated that task to controller driver to handle the
>>>>>>>> notification
>>>>>>>> accordingly. That is do a LFPS handshake first if the device is
>>>>>>>> suspended and then send notification (explained below). But we can
>>>>>>>> definitely separate this by adding an additional flag in the composite
>>>>>>>> layer to set the link state based on the gadget suspend callback
>>>>>>>> called
>>>>>>>> when U3 SUSPEND interrupt is received. Let me know if you feel
>>>>>>>> separating the two is a better approach.
>>>>>>>>
>>>>>>>
>>>>>>> The reason I think we need to separate it is because of point 3. As I
>>>>>>> note earlier, the spec states that "If remote wake event occurs in
>>>>>>> multiple functions, each function shall send a Function Wake
>>>>>>> Notification."
>>>>>>>
>>>>>>> But if there's no remote wake event, and the host brought the device up
>>>>>>> instead, then the function suspend state is retained.
>>>>>>>
>>>>>>> If we separate these 2 operations, the caller can check whether the
>>>>>>> operation went through properly. For example, if the wakeup() is
>>>>>>> initiated properly, but the function wake device notification didn't go
>>>>>>> through. We would only need to resend the device notification rather
>>>>>>> than initiate remote wakeup again.
>>>>>>
>>>>>> If we don't have to send device notification for other interfaces, we
>>>>>> can combine the operations here as you did.
>>>>>>
>>>>>
>>>>> I still think it's better to split up the operations. The way you're
>>>>> handling it now is not clear.
>>>>>
>>>>> If the func_awake() returns -EAGAIN, I'd expect that the remote wake did
>>>>> not go through and expect user to retry again. But here it does initiate
>>>>> remote wake, but it just does not send device notification yet. This is
>>>>> confusing.
>>>>>
>>>>> Also, instead of all the function wake handling coming from the function
>>>>> driver, now we depend on the controller driver to call function resume()
>>>>> on state change to U0, which will trigger device notification. What
>>>>> happen if it doesn't call resume(). There's too many dependencies and it
>>>>> seems fragile.
>>>>>
>>>>> I think all this can be handled in the function driver. You can fix the
>>>>> dwc3 wakeup() and poll for U0/ON state rather than RECOVERY state, which
>>>>> is what it's supposed to poll.
>>>>
>>>> For transitioning from U3 to U0, the current upstream implementation is
>>>> to poll for U0 state when dwc3_gadget_wakeup() is called and it is a
>>>> blocking call. (this is a common API for both HS and SS)
>>>>
>>>> /* poll until Link State changes to ON */
>>>> retries = 20000;
>>>>
>>>> while (retries--) {
>>>> reg = dwc3_readl(dwc->regs, DWC3_DSTS);
>>>>
>>>> /* in HS, means ON */
>>>> if (DWC3_DSTS_USBLNKST(reg) == DWC3_LINK_STATE_U0)
>>>> break;
>>>> }
>>>>
>>>> In my experiments I found that certain hosts take longer time to drive
>>>> HS resume signalling between the remote wakeup and the resume K and this
>>>> time varies across hosts. Hence the above polling timer is not generic
>>>> across hosts. So how do we converge on a polling timer value to work
>>>> across HS/SS and without blocking the system for a long time?
>>>
>>> Can't we take the upper limit of both base on experiment? And it
>>> shouldn't be blocking the whole system.
>>
>> On the host I was experimenting with, the time it took was around 110ms in
>> HS case. That would translate to a retry count of about ~181,000 in the
>> above polling loop. Wouldn't that be a very large value for polling?
>> And not sure if there are other hosts that take even longer time
>
> We don't want to poll that many times. We shouldn't depend on the delay
> of a register read for polling interval. Can't we just sleep in between
> interval at a reasonable interval.
>
Sleeping is not an option as the upper layers (those beyond
function/composite layer) may transmit data with a lock held.
I ran into below BUG when remote wakeup was triggered via a ping()
command and attempted sleep while polling
[ 88.676789][ T392] BUG: scheduling while atomic
[ 88.900112][ T392] Call trace:
<snip>
[ 88.912760][ T392] __schedule_bug+0x90/0x188
[ 88.917335][ T392] __schedule+0x714/0xb4c
[ 88.930568][ T392] schedule+0x110/0x204
[ 88.943620][ T392] schedule_timeout+0x94/0x134
[ 88.948371][ T392] __dwc3_gadget_wakeup+0x1ac/0x558
[ 88.953558][ T392] dwc3_gadget_wakeup+0x3c/0x8c
[ 88.958388][ T392] usb_gadget_wakeup+0x54/0x1a8
[ 88.963222][ T392] eth_start_xmit+0x130/0x830
[ 88.967876][ T392] xmit_one+0xf0/0x364
[ 88.971913][ T392] sch_direct_xmit+0x188/0x3e4
[ 88.976663][ T392] __dev_xmit_skb+0x480/0x984
[ 88.981319][ T392] __dev_queue_xmit+0x2d4/0x7d8
[ 88.986151][ T392] neigh_resolve_output+0x208/0x2f0
<snip>
The above experiment was done by removing spin_locks if any in the
wakeup() path of function/composite/controller drivers. It is running
in atomic context due to the lock held by linux networking stack/generic
packet scheduler.
So below are the only two approaches I can think of for
dwc3_gadget_wakeup() API
1.)Polling based approach: We poll until the link comes up. But cannot
sleep while polling due to above reasons.
2.)Interrupt based approach (the patches being discussed currently):
When a remote wakeup is triggered enable link state interrupts and
return right away. The function/composite drivers are later notified
about the ON event via resume callback (in a similar way how we notify
U3 to composite layer via suspend callback).
Please let me know if there is any alternate way that you can think of here.
>>>
>>>>
>>>> Some data layers like TCP/IP hold a TX lock while sending data (that
>>>> causes a remote wakeup event) and hence this wakeup() may run in atomic
>>>> context.
>>>
>>> Why hold the lock while waiting for the host to wakeup? The host is
>>> still inactive. Also, the usb_gadget_wakeup() API doesn't specify that
>>> it may run in atomic context.
>>>
>>
>> The lock might be held by upper layers who are unaware/independent of
>> underlying transport medium. The above TX lock I was referring to was
>> that held by Linux networking stack. It just pushes out data the same way it
>> would when USB is active. It is the function/composite layer being aware of
>> the function suspend would now sense this as a remote wake event and perform
>> this additional step of bringing out the link from u3 and then sending
>> device wakeup notification.
>>
>> In our current upstream implementation of dwc3_gadget_wakeup() API we hold a
>> spinlock as well. But yeah that can be rectified
>
> Holding a spin_lock for this long is not reasonable. We only need to
> lock when setting link recovery request but not while polling for DSTS
> and waiting for the link to go up.
>
> BR,
> Thinh
>
>>
>> static int dwc3_gadget_wakeup(struct usb_gadget *g)
>> {
>> struct dwc3 *dwc = gadget_to_dwc(g);
>> unsigned long flags;
>> int ret;
>>
>> spin_lock_irqsave(&dwc->lock, flags);
>> ret = __dwc3_gadget_wakeup(dwc);
>> spin_unlock_irqrestore(&dwc->lock, flags);
>>
>> return ret;
>> }
>>
>>
>>>>
>>>> To make this generic across hosts, I had switched to interrupt based
>>>> approach, enabling link state events and return error value immediately
>>>> from the dwc3_gadget_wakeup() API after doing a LFPS handshake. But
>>>> yeah, then we have to rely on the resume callback as an indication that
>>>> link is transitioned to ON state.
>>>>
>>>
>>> BR,
>>> Thinh
Powered by blists - more mailing lists