lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240307015736.4dhcrzsli4dihym5@synopsys.com>
Date: Thu, 7 Mar 2024 01:57:44 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Michael Grzeschik <mgr@...gutronix.de>
CC: Thinh Nguyen <Thinh.Nguyen@...opsys.com>, Dan Vacura <w36195@...orola.com>,
        "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        Daniel Scally <dan.scally@...asonboard.com>,
        Jeff Vanhoof <qjv001@...orola.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jonathan Corbet <corbet@....net>,
        Laurent Pinchart <laurent.pinchart@...asonboard.com>,
        Felipe Balbi <balbi@...nel.org>,
        Paul Elder <paul.elder@...asonboard.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of
 release after missed isoc

On Tue, Feb 27, 2024, Michael Grzeschik wrote:
> On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
> > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> > > For #2: I found an issue in the handling of the completion of requests in
> > > the started list. When the interrupt handler is *explicitly* calling
> > > stop_active_transfer if the overall event of the request was an missed
> > > event. This event value only represents the value of the request that
> > > was actually triggering the interrupt.
> > > 
> > > It also calls ep_cleanup_completed_requests and is iterating over the
> > > started requests and will call giveback/complete functions of the
> > > requests with the proper request status.
> > > 
> > > So this will also catch missed requests in the queue. However, since
> > > there might be, lets say 5 good requests and one missed request, what
> > > will happen is, that each complete call for the first good requests will
> > > enqueue new requests into the started list and will also call the
> > > updatecmd on that transfer that was already missed until the loop will
> > > reach the one request with the MISSED status bit set.
> > > 
> > > So in my opinion the patch from Jeff makes sense when adding the
> > > following change aswell. With those both changes the underruns and
> > > broken frames finally disappear. I am still unsure about the complete
> > > solution about that, since with this the mentioned 5 good requests
> > > will be cancelled aswell. So this is still a WIP status here.
> > > 
> > 
> > When the dwc3 driver issues stop_active_transfer(), that means that the
> > started_list is empty and there is an underrun.
> 
> At this moment this is only the case when both, pending and started list
> are empty. Or the interrupt event was EXDEV.
> 
> The main problem is that the function
> dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
> issue an complete for each started request, which on the other hand will
> refill the pending list, and therefor after that refill the
> stop_active_transfer is currently never hit.
> 
> > It treats the incoming requests as staled. However, for UVC, they are
> > still "good".
> 
> Right, so in that case we can requeue them anyway. But this will have to
> be done after the stop transfer cmd has finished.
> 
> > I think you can just check if the started_list is empty before queuing
> > new requests. If it is, perform stop_active_transfer() to reschedule the
> > incoming requests. None of the newly queue requests will be released
> > yet since they are in the pending_list.
> 
> So that is basically exactly what my patch is doing. However in the case
> of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
> as jeff stated. So his underlying patch is really fixing an issue here.

What I mean is to actively check for started list on every
usb_ep_queue() call. Checking during
dwc3_gadget_ep_cleanup_completed_requests() is already too late.

> 
> > For UVC, perhaps you can introduce a new flag to usb_request called
> > "ignore_queue_latency" or something equivalent. The dwc3 is already
> > partially doing this for UVC. With this new flag, we can rework dwc3 to
> > clearly separate the expected behavior from the function driver.
> 
> I don't know why this "extra" flag is even necessary. The code example
> is already working without that extra flag.

The flag is for controller to determine what kinds of behavior the
function driver expects. My intention is if this extra flag is not set,
the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
no stop_active_transfer()).

> 
> Actually I even came up with an better solution. Additionally of checking if
> one of the requests in the started list was missed, we can activly check if
> the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
> going to enqueue in to the empty trb ring.
> 
> So my whole change looks like that:
> 
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index efe6caf4d0e87..2c8047dcd1612 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -952,6 +952,7 @@ struct dwc3_request {
>  #define DWC3_REQUEST_STATUS_DEQUEUED		3
>  #define DWC3_REQUEST_STATUS_STALLED		4
>  #define DWC3_REQUEST_STATUS_COMPLETED		5
> +#define DWC3_REQUEST_STATUS_MISSED_ISOC		6
>  #define DWC3_REQUEST_STATUS_UNKNOWN		-1
>  	u8			epnum;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 858fe4c299b7a..a31f4d3502bd3 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>  		req = next_request(&dep->cancelled_list);
>  		dwc3_gadget_ep_skip_trbs(dep, req);
>  		switch (req->status) {
> +		case 0:
> +			dwc3_gadget_giveback(dep, req, 0);
> +			break;
>  		case DWC3_REQUEST_STATUS_DISCONNECTED:
>  			dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
>  			break;
> @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>  		case DWC3_REQUEST_STATUS_STALLED:
>  			dwc3_gadget_giveback(dep, req, -EPIPE);
>  			break;
> +		case DWC3_REQUEST_STATUS_MISSED_ISOC:
> +			dwc3_gadget_giveback(dep, req, -EXDEV);
> +			break;
>  		default:
>  			dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
>  			dwc3_gadget_giveback(dep, req, -ECONNRESET);
> @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
>  	return ret;
>  }
> +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
> +{
> +	struct dwc3_request	*req;
> +	struct dwc3_request	*tmp;
> +	int ret = 0;
> +
> +	list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> +		struct dwc3_trb *trb;
> +
> +		trb = req->trb;
> +		switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
> +		case DWC3_TRBSTS_MISSED_ISOC:
> +			/* Isoc endpoint only */
> +			ret = -EXDEV;
> +			break;
> +		case DWC3_TRB_STS_XFER_IN_PROG:
> +			/* Applicable when End Transfer with ForceRM=0 */
> +		case DWC3_TRBSTS_SETUP_PENDING:
> +			/* Control endpoint only */
> +		case DWC3_TRBSTS_OK:
> +		default:
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
>  static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>  		const struct dwc3_event_depevt *event, int status)
>  {
> @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
>  {
>  	struct dwc3		*dwc = dep->dwc;
>  	bool			no_started_trb = true;
> +	unsigned int		transfer_in_flight = 0;
> +
> +	/* It is possible that the interrupt thread was delayed by
> +	 * scheduling in the system, and therefor the HW has already
> +	 * run dry. In that case the last trb in the queue is already
> +	 * handled by the hw. By checking the HWO bit we know to restart
> +	 * the whole transfer. The condition to appear is more likelely
> +	 * if not every trb has the IOC bit set and therefor does not
> +	 * trigger the interrupt thread fewer.
> +	 */
> +	if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
> +		struct dwc3_trb *trb;
> -	dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> +		trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
> +		transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
> +	}
> -	if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> -		goto out;
> +	if (status == -EXDEV || !transfer_in_flight) {
> +		struct dwc3_request *tmp;
> +		struct dwc3_request *req;
> -	if (!dep->endpoint.desc)
> -		return no_started_trb;
> +		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> +			dwc3_stop_active_transfer(dep, true, true);
> -	if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> -		list_empty(&dep->started_list) &&
> -		(list_empty(&dep->pending_list) || status == -EXDEV))
> -		dwc3_stop_active_transfer(dep, true, true);
> -	else if (dwc3_gadget_ep_should_continue(dep))
> -		if (__dwc3_gadget_kick_transfer(dep) == 0)
> -			no_started_trb = false;
> +		list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> +			dwc3_gadget_move_cancelled_request(req,
> +					(DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
> +					DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
> +		}
> +	} else {
> +		dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> +
> +		if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> +			goto out;
> +
> +		if (!dep->endpoint.desc)
> +			return no_started_trb;
> +
> +		if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> +			list_empty(&dep->started_list) && list_empty(&dep->pending_list))
> +			dwc3_stop_active_transfer(dep, true, true);
> +		else if (dwc3_gadget_ep_should_continue(dep))
> +			if (__dwc3_gadget_kick_transfer(dep) == 0)
> +				no_started_trb = false;
> +	}
>  out:
>  	/*
> 
> I will seperate the whole hunk into smaller changes and send an v1
> the next days to review.
> 

No, we should not reschedule for every missed-isoc. We only want to
target underrun condition.

Thanks,
Thinh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ