lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 29 May 2024 23:24:18 +0200
From: Michael Grzeschik <mgr@...gutronix.de>
To: Avichal Rakesh <arakesh@...gle.com>
Cc: Alan Stern <stern@...land.harvard.edu>,
	Laurent Pinchart <laurent.pinchart@...asonboard.com>,
	Daniel Scally <dan.scally@...asonboard.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jayant Chowdhary <jchowdhary@...gle.com>,
	"etalvala@...gle.com" <etalvala@...gle.com>,
	Michael Riesch <michael.riesch@...fvision.net>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Subject: Re: [PATCH 0/3] usb: gadget: uvc: allocate requests based on frame
 interval length and buffersize

On Tue, May 28, 2024 at 05:33:46PM -0700, Avichal Rakesh wrote:
>
>
>On 5/28/24 15:43, Michael Grzeschik wrote:
>> On Tue, May 28, 2024 at 02:27:34PM -0700, Avichal Rakesh wrote:
>>>
>>>
>>> On 5/28/24 13:22, Michael Grzeschik wrote:
>>>> On Tue, May 28, 2024 at 10:30:30AM -0700, Avichal Rakesh wrote:
>>>>>
>>>>>
>>>>> On 5/22/24 10:37, Michael Grzeschik wrote:
>>>>>> On Wed, May 22, 2024 at 05:17:02PM +0000, Thinh Nguyen wrote:
>>>>>>> On Wed, May 22, 2024, Alan Stern wrote:
>>>>>>>> On Wed, May 22, 2024 at 01:41:42AM +0000, Thinh Nguyen wrote:
>>>>>>>> > On Wed, May 22, 2024, Michael Grzeschik wrote:
>>>>>>>> > > On Fri, May 17, 2024 at 01:44:05AM +0000, Thinh Nguyen wrote:
>>>>>>>> > > > For isoc endpoint IN, yes. If the host requests for isoc data IN while
>>>>>>>> > > > no TRB is prepared, then the controller will automatically send 0-length
>>>>>>>> > > > packet respond.
>>>>>>>> > >
>>>>>>>> > > Perfect! This will help a lot and will make active queueing of own
>>>>>>>> > > zero-length requests run unnecessary.
>>>>>>>> >
>>>>>>>> > Yes, if we rely on the current start/stop isoc transfer scheme for UVC,
>>>>>>>> > then this will work.
>>>>>>>>
>>>>>>>> You shouldn't rely on this behavior.  Other device controllers might not
>>>>>>>> behave this way; they might send no packet at all to the host (causing a
>>>>>>>> USB protocol error) instead of sending a zero-length packet.
>>>>>>>
>>>>>>> I agree. The dwc3 driver has this workaround to somewhat work with the
>>>>>>> UVC. This behavior is not something the controller expected, and this
>>>>>>> workaround should not be a common behavior for different function
>>>>>>> driver/protocol. Since this behavior was added a long time ago, it will
>>>>>>> remain the default behavior in dwc3 to avoid regression with UVC (at
>>>>>>> least until the UVC is changed). However, it would be nice for UVC to
>>>>>>> not rely on this.
>>>>>>
>>>>>> With "this" you mean exactly the following commit, right?
>>>>>>
>>>>>> (f5e46aa4 usb: dwc3: gadget: when the started list is empty stop the active xfer)
>>>>>>
>>>>>> When we start questioning this, then lets dig deeper here.
>>>>>>
>>>>>> With the fast datarate of at least usb superspeed shouldn't they not all
>>>>>> completely work asynchronous with their in flight trbs?
>>>>>>
>>>>>> In my understanding this validates that, with at least superspeed we are
>>>>>> unlikely to react fast enough to maintain a steady isoc dataflow, since
>>>>>> the driver above has to react to errors in the processing context.
>>>>>>
>>>>>> This runs the above patch (f5e46aa4) a gadget independent solution
>>>>>> which has nothing to do with uvc in particular IMHO.
>>>>>>
>>>>>> How do other controllers and their drivers work?
>>>>>>
>>>>>>> Side note, when the dwc3 driver reschedules/starts isoc transfer again,
>>>>>>> the first transfer will be scheduled go out at some future interval and
>>>>>>> not the next immediate microframe. For UVC, it probably won't be a
>>>>>>> problem since it doesn't seem to need data going out every interval.
>>>>>>
>>>>>> It should not make a difference. [TM]
>>>>>>
>>>>>
>>>>>
>>>>> Sorry for being absent for a lot of this discussion.
>>>>>
>>>>> I want to take a step back from the details of how we're
>>>>> solving the problem to what problems we're trying to solve.
>>>>>
>>>>> So, question(s) for Michael, because I don't see an explicit
>>>>> answer in this thread (and my sincerest apologies if they've
>>>>> been answered already and I missed it):
>>>>>
>>>>> What exactly is the bug (or bugs) we're trying to solve here?
>>>>>
>>>>> So far, it looks like there are two main problems being
>>>>> discussed:
>>>>>
>>>>> 1. Reducing the bandwidth usage of individual usb_requests
>>>>> 2. Error handling for when transmission over the wire fails.
>>>>>
>>>>> Is that correct, or are there other issues at play here?
>>>>
>>>> That is correct.
>>>>
>>>>> (1) in isolation should be relatively easy to solve: Just
>>>>> smear the encoded frame across some percentage
>>>>> (prefereably < 100%) of the usb_requests in one
>>>>> video frame interval.
>>>>
>>>> Right.
>>>>
>>>>> (2) is more complicated, and your suggestion is to have a
>>>>> sentinel request between two video frames that tells the
>>>>> host if the transmission of "current" video frame was
>>>>> successful or not. (Is that correct?)
>>>>
>>>> Right.
>>>>
>>>>> Assuming my understanding of (2) is correct, how should
>>>>> the host behave if the transmission of the sentinel
>>>>> request fails after the video frame was sent
>>>>> successfully (isoc is best effort so transmission is
>>>>> never guaranteed)?
>>>>
>>>> If we have transmitted all requests of the frame but would only miss the
>>>> sentinel request to the host that includes the EOF, the host could
>>>> rather show it or drop it. The drop would not be necessary since the
>>>> host did see all data of the frame. The user would not see any
>>>> distortion in both cases.
>>>>
>>>> If we have transmitted only partial data of the frame it would be
>>>> preferred if the host would drop the frame that did not finish with an
>>>> proper EOF tag.
>>>>
>>>> AFAIK the linux kernel would still show the frame if the FID of the
>>>> currently handled request would change and would take this as the end of
>>>> frame indication. But I am not totally sure if this is by spec or
>>>> optional.
>>>>
>>>> One option to be totally sure would be to resend the sentinel request to
>>>> be properly transmitted before starting the next frame. This resend
>>>> polling would probably include some extra zero-length requests. But also
>>>> if this resend keeps failing for n times, the driver should doubt there
>>>> is anything sane going on with the USB connection and bail out somehow.
>>>>
>>>> Since we try to tackle case (1) to avoid transmit errors and also avoid
>>>> creating late enqueued requests in the running isoc transfer, the over
>>>> all chance to trigger missed transfers should be minimal.
>>>
>>> Gotcha. It seems like the UVC gadget driver implicitly assumes that EOF
>>> flag will be used although the userspace application can technically
>>> make it optional.
>>
>> That is not all. The additional UVC_STREAM_ERR tag on the sentinel
>> request can be set optional by the host driver. But by spec the
>> userspace application has to drop the frame when the flag was set.
>
>Looking at the UVC specs, the ERR bit doesn't seem to refer to actual
>transmission error, only errors in frame generation (Section 4.3.1.7
>of UVC 1.5 Class Specification). Maybe "data discontinuity" can be
>used but the examples given are bad media, and encoder issues, which
>suggests errors at higher level than the wire.

Oh! That is a new perspective I did not consider.

With the definition of UVC_STREAM_ERR by spec, the uvc_video driver
would in no case set this header bit for the current frame on its own?
Is that correct?

>> With my proposal this flag will be set, whenever we find out that
>> the currently transferred frame was erroneous.
>>
>>> Summarizing some of the discussions above:
>>> 1. UVC gadget driver should _not_ rely on the usb controller to
>>>   enqueue 0-length requests on UVC gadget drivers behalf;
>>> 2. However keeping up the backpressure to the controller means the
>>>   EOF request will be delayed behind all the zero-length requests.
>>
>> Exactly, this is why we have to somehow finetune the timedelay between
>> requests that trigger interrupts. And also monitor the amount of
>> requests currently enqueued in the hw ringbuffer. So that our drivers
>> enqueue dequeue mechanism is virtually adding only the minimum amount
>> of necessary zero-length requests in the hardware. This should be
>> possible.
>>
>> I am currently thinking through the remaining steps the pump worker has
>> to do on each wakeup to maintain the minimum threshold while waiting
>> with submitting requests that contain actual image payload.
>>
>>> Out of curiosity: What is wrong with letting the host rely on
>>> FID alone? Decoding the jpeg payload _should_ fail if any of the
>>> usb_requests containing the payload failed to transmit.
>>
>> This is not totally true. We saw partially rendered jpeg frames on the
>> host stream. How the host behaves with broken data is totally undefined
>> if the typical uvc flags EOF/ERR are not used as specified. Then think
>> about uncompressed formats. So relying on the transferred image format
>> to solve our problems is just as wrong as relying on the gadgets
>> hardware behavior.
>
>Do you know if the partially rendered frames were valid JPEGs, or
>if the host was simply making a best effort at displaying a broken
>JPEG? Perhaps the fix should go to the host instead?

I can fully reproduce this with linux and windows hosts. For linux
machines I saw that the host was taking the FID change as a marker
to see the previous frame as ready and just rendered what got through.
This did not lead to garbage but only to partially displayed frames
with jpeg macroblock alignment.

>Following is my opinion, feel free to disagree (and correct me if
>something is factually incorrect):
>
>The fundamental issue here is that ISOC doesn't guarantee
>delivery of usb_requests or even basic data consistency upon delivery.
>So the gadget driver has no way to know the state of transmitted data.
>The gadget driver is notified of underruns but not of any other issues,
>and ideally we should never have an underrun if the zero-length
>backpressure is working as intended.
>
>So, UVC gadget driver can reduce the number of errors, but it'll never be
>able to guarantee that the data transmitted to the host isn't somehow
>corrupted or missing unless a more reliable mode of transmission
>(bulk, for example) is used.
>
>All of this to say: The host absolutely needs to be able to handle
>all sorts of invalid and broken payloads. How the host handles it
>might be undefined, but the host can never rely on perfect knowledge
>about the transmission state. In cases like these, where the underlying
>transport is unreliable, the burden of enforcing consistency moves up
>a layer, i.e. to the encoded payload in this case. So it is perfectly
>fine for the host to rely on the encoding to determine if the payload
>is corrupt and handle it accordingly.

Right.

>As for uncompressed format, you're correct that subtle corruptions
>may not be caught, but outright missing usb_requests can be easily
>checked by simply looking at the number of bytes in the payload. YUV
>frames are all of the same (predetermined) size for a given resolution.

That was also my thought about five minutes after I did send you the
previous mail. So sure, this is no real issue for the host.

>So my recommendation is the following:
>1. Fix the bandwidth problem by splitting the encoded video frame
>   into more usb_requests (as your patch already does) making sure
>   there are enough free usb_request to encode the video frame in
>   one burst so we don't accidentally inflate the transmission
>   duration of a video frame by sneaking in zero-length requests in
>   the middle.

Ack. This should already solve a lot of issues.

For this I would still suggest to move the usb_ep_queue to be done in
the pump worker again. Its a bit back and forth, but IMHO its worth the
extra mile since only this way we would respect the dwc3 interrupt
threads assumption to run *very* short.

>2. Unless there is an unusually high rate of transmission failures
>   when using the UVC gadget driver, it might be worth fixing the
>   host side driver to handle broken frames better instead (assuming
>   host is linux as well).

Agreed, but this needs a separate scoped undestanding of the host side
behaviour over all layers.

>2. Tighten up the error checking in UVC gadget driver -- We drop the
>   current frame whenever an EXDEV happens which is wrong. We should
>   only be dropping the current frame if the EXDEV corresponds to the
>   frame currently being encoded.

What do you mean by drop?

I would suggest to immediatly switch the uvc_buffer that is being
enqueued and start queueing prepared requests from the next buffers prep
list. As suggested, the idea is to have per uvc_buffer prep_list
requests which would make this task easy.

>   If the frame is already fully queued to the usb controller, the host
>   can handle missing payload as it sees fit.

Ack

This roadmap sounds like a good one.

Michael

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ