[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.1812041138440.1537-100000@iolanthe.rowland.org>
Date: Tue, 4 Dec 2018 11:46:37 -0500 (EST)
From: Alan Stern <stern@...land.harvard.edu>
To: Anurag Kumar Vulisha <anuragku@...inx.com>
cc: Felipe Balbi <balbi@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Shuah Khan <shuah@...nel.org>, Johan Hovold <johan@...nel.org>,
Jaejoong Kim <climbbb.kim@...il.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Roger Quadros <rogerq@...com>,
Manu Gautam <mgautam@...eaurora.org>,
"martin.petersen@...cle.com" <martin.petersen@...cle.com>,
Bart Van Assche <bvanassche@....org>,
Mike Christie <mchristi@...hat.com>,
Matthew Wilcox <willy@...radead.org>,
Colin Ian King <colin.king@...onical.com>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"v.anuragkumar@...il.com" <v.anuragkumar@...il.com>,
Thinh Nguyen <thinhn@...opsys.com>,
Tejas Joglekar <tejas.joglekar@...opsys.com>,
Ajay Yugalkishore Pandey <APANDEY@...inx.com>
Subject: RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb
requests
On Tue, 4 Dec 2018, Anurag Kumar Vulisha wrote:
> >> Yes the host can cancel the transfer. This issue originated from the endpoints using
> >bulk
> >> streaming protocol and may not occur with normal endpoints. AFAIK bulk streaming
> >is
> >> gadget driven, where the gadget is allowed to select which stream id transfer the
> >host
> >> should work on . Since the host doesn't have control on when the transfer would be
> >> selected by gadget, it may wait for longer timeouts before cancelling the transfer.
> >
> >You're missing the point. Although the device selects which stream ID
> >gets transferred, the _host_ decides whether a stream transfer should
> >occur in the first place. No matter how many ERDY packets the device
> >controller tries to send, no transfer will occur until the host wants
> >to do it.
> >
> >In this sense, stream transfers (like all other USB interactions except
> >wakeup requests) are host-driven.
> >
>
> I agree with you that without host willing to transfer, the transfer wouldn't have
> started and also agree the host always has the capability of detecting the hang. But
> we are not sure on what host platform and operating system the gadget would be
> tested and also not sure whether the host software is capable of handling the deadlock.
> Even if the host has a timer , it would be very long and waiting for the timer to get
> timedout would reduce the performance. In this patch we are giving the user the
> flexibility to opt the timeout value, so that the deadlock can be avoided without
> effecting the performance. So I think adding the timer in gadget is more easier solution
> than handling in host. I have seen this workaround working for both linux & windows.
Is there any way for the device controller driver to detect the problem
without relying on a timeout?
In fact, is this problem a known bug in the existing device controller
hardware? Have the controller manufacturers published a suggested
scheme for working around it?
> >> >> Since the gadget
> >> >> controller driver is aware that the controller is stuck , makes it responsible
> >> >> to recover the controller from hang condition by restarting the transfer (which
> >> >> triggers the controller FSM to issue ERDY to host).
> >> >
> >> >Isn't there a cleaner way to recover than by cancelling the request and
> >> >resubmitting it?
> >> >
> >>
> >> dequeuing the request issues the stop transfer command to the controller, which
> >> cleans all the hardware resource allocated for that endpoint. This also resets the
> >> hardware FSMs for that endpoint . So, when re-queuing of the transfer happens
> >> the controller starts allocating hardware resources again, thus avoiding the
> >probability
> >> of entering into the issue. I am not sure of other controllers, but for dwc3, issuing
> >> the stop transfer is the only way to handle this issue.
> >
> >Again you're missing the point. Can't the controller driver issue the
> >Stop Transfer command but still consider the request to be in progress
> >(i.e., don't dequeue the request) so that the gadget driver's
> >completion callback isn't invoked and the request does not need to be
> >explicitly resubmitted?
> >
>
> Yes , what you are saying is correct. My initial patches were following the
> same approach that you have suggested. We switched to the dequeue
> approach because there can be different gadgets which may enter into
> this issue and it would be better to follow a generic approach rather than
> having every controller driver implementing their own workaround.
> Please find the conservation in this link https://patchwork.kernel.org/patch/10640145/
At this point, it seems that the generic approach will be messier than
having every controller driver implement its own fix. At least, that's
how it appears to me.
> >> @Felipe: Can you please provide your suggestion on this.
> >
> >> >How can the gadget driver know what timeout to use? The host is
> >> >allowed to be as slow as it wants; the gadget driver doesn't have any
> >> >way to tell when the host wants to start the transfer.
> >>
> >> Yes , I agree with you that the timeout may vary from usage to usage. This timeout
> >> should be decided by the class driver which queues the request. As discussed above
> >> this issue was observed in streaming protocol , which is very much faster than
> >normal
> >> BOT modes and it works on super speed .
> >
> >Although USB mass storage is currently the only user of the stream
> >protocol, that may not be true in the future. You should think in more
> >general terms. A timeout which is appropriate for mass storage may not
> >be appropriate for some other application.
> >
>
> You are correct , this timeout may not be ideal. Since I was not able to reproduce this issue
> on non-stream capable transfers , at present the only user of usb_ep_queue_timeout() API
> is the streaming gadget. As we are not aware of the optimal timeout value, instead of
> hardcoding the timeout value we can add module_param() for it. So that the user can configure
> timeout based on their use case. Please let me know your suggestion on this.
Ideally it would not be necessary to rely on a timeout at all.
Also, maintainers dislike module parameters. It would be better not to
add one.
Alan Stern
Powered by blists - more mailing lists