lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 Oct 2022 21:10:49 -0500
From:   Dan Vacura <w36195@...orola.com>
To:     Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc:     "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        Daniel Scally <dan.scally@...asonboard.com>,
        Jeff Vanhoof <qjv001@...orola.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jonathan Corbet <corbet@....net>,
        Laurent Pinchart <laurent.pinchart@...asonboard.com>,
        Felipe Balbi <balbi@...nel.org>,
        Paul Elder <paul.elder@...asonboard.com>,
        Michael Grzeschik <m.grzeschik@...gutronix.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of
 release after missed isoc

Hi Thinh,

On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> On Mon, Oct 17, 2022, Dan Vacura wrote:
> > From: Jeff Vanhoof <qjv001@...orola.com>
> > 
> > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > no_interrupt=1 is used. This can happen if the hardware is still using
> > the data associated with a TRB after the usb_request's ->complete call
> > has been made.  Instead of immediately releasing a request when a Missed
> > ISOC interrupt has occurred, this change will add logic to cancel the
> > request instead where it will eventually be released when the
> > END_TRANSFER command has completed. This logic is similar to some of the
> > cleanup done in dwc3_gadget_ep_dequeue.
> 
> This doesn't sound right. How did you determine that the hardware is
> still using the data associated with the TRB? Did you check the TRB's
> HWO bit?

The problem we're seeing was mentioned in the summary of this patch
series, issue #1. Basically, with the following patch
https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@pengutronix.de/
integrated a smmu panic is occurring on our Android device with the 5.15
kernel which is:

    <3>[  718.314900][  T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!

The uvc gadget driver appears to be the first (and only) gadget that
uses the no_interrupt=1 logic, so this seems to be a new condition for
the dwc3 driver. In our configuration, we have up to 64 requests and the
no_interrupt=1 for up to 15 requests. The list size of dep->started_list
would get up to that amount when looping through to cleanup the
completed requests. From testing and debugging the smmu panic occurs
when a -EXDEV status shows up and right after
dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
we had was the requests were getting returned to the gadget too early.

> 
> The dwc3 driver would only give back the requests if the TRBs of the
> associated requests are completed or when the device is disconnected.
> If the TRB indicated missed isoc, that means that the TRB is completed
> and its status was updated.

Interesting, the device is not disconnected as we don't get the
-ESHUTDOWN status back and with this patch in place things continue
after a -EXDEV status is received.

> 
> There's a special case which dwc3 may give back requests early is the
> case of the device disconnecting. The requests should be returned with
> -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
> de-initialization anyway.
> 
> We should not issue End Transfer command just because of missed isoc. We
> may want issue End Transfer if the gadget driver is too slow and unable
> to feed requests in time (causing underrun and missed isoc) to resync
> with the host, but we already handle that.

Hmm, isn't that what happens when we get into this
condition in dwc3_gadget_endpoint_trbs_complete():

	if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
		list_empty(&dep->started_list) &&
		(list_empty(&dep->pending_list) || status == -EXDEV))
		dwc3_stop_active_transfer(dep, true, true);

> 
> I'm still not clear what's the problem you're seeing. Do you have the
> crash log? Tracepoints?
> 
> BR,
> Thinh

Appreciate the support!

Dan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ