linux-kernel - Re: [PATCH v2] usb: dwc3: gadget: check drained isoc ep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZjbIeib2UMta7FbY@pengutronix.de>
Date: Sun, 5 May 2024 01:44:58 +0200
From: Michael Grzeschik <mgr@...gutronix.de>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] usb: dwc3: gadget: check drained isoc ep

On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
>On Tue, Apr 23, 2024, Michael Grzeschik wrote:
>> Hi Thinh,
>>
>> On Thu, Apr 04, 2024 at 12:29:14AM +0000, Thinh Nguyen wrote:
>> > On Tue, Apr 02, 2024, Thinh Nguyen wrote:
>> > > On Tue, Apr 02, 2024, Thinh Nguyen wrote:
>> > > > My concern here is for the case where transfer_in_flight == true and
>> > >
>> > > I mean transfer_in_flight == false
>> > >
>> > > > list_empty(started_list) == false. That means that the requests in the
>> > > > started_list are completed but are not given back to the gadget driver.
>> > > >
>> > > > Since they remained in the started_list, they will be resubmitted again
>> > > > on the next usb_ep_queue. We may send duplicate transfers right?
>> >
>> > Actually, since the requests are completed, the HWO bits are cleared,
>> > nothing is submitted and no duplicate. But since the requests are not
>> > given back yet from the started_list, then the next Start_Transfer
>> > command will begin with the TRB address of the completed request
>> > (HWO=0), the controller may not process the next TRBs. Have you tested
>> > this scenario?
>> >
>> > > >
>> > > > You can try to cleanup requests in the started_list, but you need to be
>> > > > careful to make sure you're not out of sync with the transfer completion
>> > > > events and new requests from gadget driver.
>> > > >
>> >
>> > Was the problem you encounter due to no_interrupt settings where the
>> > it was set to the last request of the uvc data pump?
>> >
>> > if that's the case, can UVC function driver make sure to not set
>> > no_interrupt to the last request of the data pump from the UVC?
>>
>> Actually no. What I want to do is to ensure that the dwc3 stream
>> is stopped when the hardware was drained. Which is a valid point
>> in my case. Since we are actually potentially enqueueing new request
>> in the complete handler, be it zero length or real transfers.
>>
>> Calling kick_transfer on an drained hw will absolutely run into
>> missed isocs if the irq thread was called late. We saw this on real hardware,
>> where another irq_thread was scheduled with the same priority as the
>> dwc3 irq_thread but was running so long that the HW was running dry in
>> between the hw irq and the actual dwc3_irq_thread run.
>>
>
>Right. Unfortunately, dwc3 can only "guess" when UVC function stops
>pumping more request or whether it's due to some large latency. The
>logic to workaround this underrun issue will not be foolproof. Perhaps
>we can improve upon it, but the solution is better implement at the UVC
>function driver.

Yes, the best way to solve this is in the uvc driver.

>I thought we have the mechanism in UVC function now to ensure queuing
>enough zero-length requests to account for underrun/latency issue?
>What's the issue now?

This is actually only partially true. Even with the zero-length packages
it is possible that we run into underruns. This is why we implemented
this patch. This has happened because another interrupt thread with the
same prio on the same CPU as this interrupt thread was keeping the CPU
busy. As the dwc3 interrupt thread get to its call, the time was already
over and the hw was already drained, although the started list was not
yet empty, which was causing the next queued requests to be queued to
late. (zero length or not)

Yes, this needed to be solved on the upper level first, by moving the
long running work of the other interrupt thread to another thread or
even into the userspace.

However I thought it would be great if we could somehow find out in
the dwc3 core and make the pump mechanism more robust against such
late enqueues.

This all started with that series.

https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/

And patch 2 of this series did work well so far. The next move was this
patch.

Since the last week debugging we found out that it got other issues.
It is not allways save to read the HWO bit, from the driver.

Turns out that after an new TRB was prepared with the HWO bit set
it is not save to read immideatly back from that value as the hw
will be doing some operations on that exactly new prepared TRB.

We ran into this problem when applying this patch. The trb buffer list
was actually filled but we hit a false positive where the latest HWO bit
was 0 (probably due to the hw action in the background) and therefor
went into end transfer.

Michael

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)