[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240222011955.7sida4udjlvrlue7@synopsys.com>
Date: Thu, 22 Feb 2024 01:20:04 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Michael Grzeschik <mgr@...gutronix.de>
CC: Dan Vacura <w36195@...orola.com>, Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
Daniel Scally <dan.scally@...asonboard.com>,
Jeff Vanhoof <qjv001@...orola.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jonathan Corbet <corbet@....net>,
Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Felipe Balbi <balbi@...nel.org>,
Paul Elder <paul.elder@...asonboard.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of
release after missed isoc
On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> Sorry for digging up this grave! :)
>
> I once more came accross the whole situation we are still encountering
> since one year or so again and found the some reasons why:
>
> #1 there are so many latencies, so that the system is not fast enough to
> enqueue requests back into an running HW-Transfer. At least on our
> system setup.
>
> and
>
> #2 there are so many missed transfers leading to broken frames
> when adding request with no_interrupt set.
>
> For #1: There sometimes are situations in the system where the threaded
> interrupt handler for the dwc3 is not called fast enough, although the
> HW-irq was called early and enqueued the irq event and woke the irq
> thread early. In our case this often happens, when there are other tasks
> involved on the same CPU and the scheduler is not able to pipeline the
> irq thread in the necessary time. In our case the main issue is an
> HW-irq handler of the ethernet controller (cadence macb) that runs
> berserk on CPU0 and therefor is taking a lot of CPU time. Per default on
> our system all irq handlers are running on the same CPU. As per
> definition all interrupt threads will be started on the same CPU as the
> irq was called, this forces a lot of pressure on one Core. So changing
> the smp_affinity of the dwc3 irq to the second CPU only, already solves
> a lot of the underruns.
That's great!
>
> For #2: I found an issue in the handling of the completion of requests in
> the started list. When the interrupt handler is *explicitly* calling
> stop_active_transfer if the overall event of the request was an missed
> event. This event value only represents the value of the request that
> was actually triggering the interrupt.
>
> It also calls ep_cleanup_completed_requests and is iterating over the
> started requests and will call giveback/complete functions of the
> requests with the proper request status.
>
> So this will also catch missed requests in the queue. However, since
> there might be, lets say 5 good requests and one missed request, what
> will happen is, that each complete call for the first good requests will
> enqueue new requests into the started list and will also call the
> updatecmd on that transfer that was already missed until the loop will
> reach the one request with the MISSED status bit set.
>
> So in my opinion the patch from Jeff makes sense when adding the
> following change aswell. With those both changes the underruns and
> broken frames finally disappear. I am still unsure about the complete
> solution about that, since with this the mentioned 5 good requests
> will be cancelled aswell. So this is still a WIP status here.
>
When the dwc3 driver issues stop_active_transfer(), that means that the
started_list is empty and there is an underrun. It treats the incoming
requests as staled. However, for UVC, they are still "good".
I think you can just check if the started_list is empty before queuing
new requests. If it is, perform stop_active_transfer() to reschedule the
incoming requests. None of the newly queue requests will be released
yet since they are in the pending_list.
For UVC, perhaps you can introduce a new flag to usb_request called
"ignore_queue_latency" or something equivalent. The dwc3 is already
partially doing this for UVC. With this new flag, we can rework dwc3 to
clearly separate the expected behavior from the function driver.
BR,
Thinh
Powered by blists - more mailing lists