lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221025063359.GA19049@qjv001-XeonWs>
Date:   Tue, 25 Oct 2022 01:34:01 -0500
From:   Jeff Vanhoof <jdv1029@...il.com>
To:     linux-usb@...r.kernel.org
Cc:     linux-usb@...r.kernel.org,
        Daniel Scally <dan.scally@...asonboard.com>,
        Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jonathan Corbet <corbet@....net>,
        Laurent Pinchart <laurent.pinchart@...asonboard.com>,
        Felipe Balbi <balbi@...nel.org>,
        Paul Elder <paul.elder@...asonboard.com>,
        Michael Grzeschik <m.grzeschik@...gutronix.de>,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org
Subject: uvc gadget performance issues with skip interrupt impl

Hi,

During the queuing up of requests from the UVC Gadget Driver to DWC3 for one
frame, if a missed isoc event occurs then it is possible for the next
consecutive frame(s) to also see missed isoc related errors as a result,
presenting to the user as a large video stall.

This issue appears to have come in with the skip interrupt implementation in
the UVC Gadget Driver:

usb: gadget: uvc: decrease the interrupt load to a quarter
https://lore.kernel.org/r/20210628155311.16762-6-m.grzeschik@pengutronix.de

Below is an example flow of how the issue can occur (and why).

For example (ISOC use case):
1) DWC3 driver has 4 requests queued up from the UVC Gadget Driver.

2) First request has IOC bit set due to no_interrupt=0 also being set, and IMI
bit is set to detect missed ISOC.

3) Requests 2,3,4 do not have IOC bit set due to no_interrupt=1 being set for
them. (Note: Whether or not the IMI bit is set for these requests does not
matter, issue can still crop up as there is no guarantee that request 2,3,4
will see a missed isoc event)

4) First request gets a missed isoc event and DWC3 returns the req and error to
UVC Gadget Driver.

5) UVC Gadget Driver, in uvc_video_complete, proceeds to cancel the queue by
calling uvcg_queue_cancel.

6) UVC Gadget Driver stops sending additional requests for the current frame.

7) DWC3 will still have requests 2,3,4 queued up and sitting in its
started_list as these requests are not given back to the UVC gadget driver
because they each have no_interrupt=1 set, and the DWC3 driver will not have
any additional interrupts triggered for them as a result.

8) Approximately 30-100ms later a new frame enters the UVC Gadget Driver (from
V4L2), and it proceeds to send additional requests to the DWC3 driver.

9) Because requests 2,3,4 are still sitting in the started_list of the dwc3
driver, the driver does not stop and restart the transmission that normally
helps it recover from the missed isoc situation (this usually happens in
between frames).

10) Some of the requests from the new frame will have no_interrupt=0 set, but
these requests will be considered missed/late by the DWC3 controller.

11) Because these new requests have the IOC bit set (and possibly IMI),
interrupts will be triggered causing the DWC3 Driver to return the req and
error to the UVC Gadget Driver.

12) And if the last set of requests sent by the UVC Gadget Driver have
"no_interrupt=1" set, then DWC3 may not interrupt further until new requests
come in, and the cycle of frame drops/errors will continue.

I have briefly mentioned this issue in another conversation with Thinh. At the
time he mentioned that 3 things could possibly be done to help resolve this
issue:

1) The UVC Gadget Driver should ensure that the last requests queued to DWC3
must always have "no_interrupt=0" set.

2) DWC3 can detect stale requests, stop the transmission and give back the
requests to the UVC Gadget Driver, and restart the transmission for the new set
of requests.

3) Set "no_interrupt=0" for each request.
 
I have tested out various implementations for all 3 possibilities and they each
seem to work ok. Note that these test implementations are not ready for prime
time, but served as a way to prove that potential changes in these areas could
help to resolve this issue.

I believe that a change for the UVC Gadget Driver should be made, but it also
makes sense for the DWC3 driver to also attempt to recover from this situation
if possible.

Does anyone have an opinion on the best way to proceed?

Thanks,
Jeff

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ