lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 7 Mar 2024 17:15:12 +0100
From: Michael Grzeschik <mgr@...gutronix.de>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc: Dan Vacura <w36195@...orola.com>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	Daniel Scally <dan.scally@...asonboard.com>,
	Jeff Vanhoof <qjv001@...orola.com>,
	"stable@...r.kernel.org" <stable@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jonathan Corbet <corbet@....net>,
	Laurent Pinchart <laurent.pinchart@...asonboard.com>,
	Felipe Balbi <balbi@...nel.org>,
	Paul Elder <paul.elder@...asonboard.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of
 release after missed isoc

On Thu, Mar 07, 2024 at 01:57:44AM +0000, Thinh Nguyen wrote:
>On Tue, Feb 27, 2024, Michael Grzeschik wrote:
>> On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
>> > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
>> > > For #2: I found an issue in the handling of the completion of requests in
>> > > the started list. When the interrupt handler is *explicitly* calling
>> > > stop_active_transfer if the overall event of the request was an missed
>> > > event. This event value only represents the value of the request that
>> > > was actually triggering the interrupt.
>> > >
>> > > It also calls ep_cleanup_completed_requests and is iterating over the
>> > > started requests and will call giveback/complete functions of the
>> > > requests with the proper request status.
>> > >
>> > > So this will also catch missed requests in the queue. However, since
>> > > there might be, lets say 5 good requests and one missed request, what
>> > > will happen is, that each complete call for the first good requests will
>> > > enqueue new requests into the started list and will also call the
>> > > updatecmd on that transfer that was already missed until the loop will
>> > > reach the one request with the MISSED status bit set.
>> > >
>> > > So in my opinion the patch from Jeff makes sense when adding the
>> > > following change aswell. With those both changes the underruns and
>> > > broken frames finally disappear. I am still unsure about the complete
>> > > solution about that, since with this the mentioned 5 good requests
>> > > will be cancelled aswell. So this is still a WIP status here.
>> > >
>> >
>> > When the dwc3 driver issues stop_active_transfer(), that means that the
>> > started_list is empty and there is an underrun.
>>
>> At this moment this is only the case when both, pending and started list
>> are empty. Or the interrupt event was EXDEV.
>>
>> The main problem is that the function
>> dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
>> issue an complete for each started request, which on the other hand will
>> refill the pending list, and therefor after that refill the
>> stop_active_transfer is currently never hit.
>>
>> > It treats the incoming requests as staled. However, for UVC, they are
>> > still "good".
>>
>> Right, so in that case we can requeue them anyway. But this will have to
>> be done after the stop transfer cmd has finished.
>>
>> > I think you can just check if the started_list is empty before queuing
>> > new requests. If it is, perform stop_active_transfer() to reschedule the
>> > incoming requests. None of the newly queue requests will be released
>> > yet since they are in the pending_list.
>>
>> So that is basically exactly what my patch is doing. However in the case
>> of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
>> as jeff stated. So his underlying patch is really fixing an issue here.
>
>What I mean is to actively check for started list on every
>usb_ep_queue() call. Checking during
>dwc3_gadget_ep_cleanup_completed_requests() is already too late.

I see.

>>
>> > For UVC, perhaps you can introduce a new flag to usb_request called
>> > "ignore_queue_latency" or something equivalent. The dwc3 is already
>> > partially doing this for UVC. With this new flag, we can rework dwc3 to
>> > clearly separate the expected behavior from the function driver.
>>
>> I don't know why this "extra" flag is even necessary. The code example
>> is already working without that extra flag.
>
>The flag is for controller to determine what kinds of behavior the
>function driver expects. My intention is if this extra flag is not set,
>the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
>no stop_active_transfer()).

Ok.

>>
>> Actually I even came up with an better solution. Additionally of checking if
>> one of the requests in the started list was missed, we can activly check if
>> the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
>> going to enqueue in to the empty trb ring.
>>
>> So my whole change looks like that:
>>
>> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>> index efe6caf4d0e87..2c8047dcd1612 100644
>> --- a/drivers/usb/dwc3/core.h
>> +++ b/drivers/usb/dwc3/core.h
>> @@ -952,6 +952,7 @@ struct dwc3_request {
>>  #define DWC3_REQUEST_STATUS_DEQUEUED		3
>>  #define DWC3_REQUEST_STATUS_STALLED		4
>>  #define DWC3_REQUEST_STATUS_COMPLETED		5
>> +#define DWC3_REQUEST_STATUS_MISSED_ISOC		6
>>  #define DWC3_REQUEST_STATUS_UNKNOWN		-1
>>  	u8			epnum;
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 858fe4c299b7a..a31f4d3502bd3 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>  		req = next_request(&dep->cancelled_list);
>>  		dwc3_gadget_ep_skip_trbs(dep, req);
>>  		switch (req->status) {
>> +		case 0:
>> +			dwc3_gadget_giveback(dep, req, 0);
>> +			break;
>>  		case DWC3_REQUEST_STATUS_DISCONNECTED:
>>  			dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
>>  			break;
>> @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>  		case DWC3_REQUEST_STATUS_STALLED:
>>  			dwc3_gadget_giveback(dep, req, -EPIPE);
>>  			break;
>> +		case DWC3_REQUEST_STATUS_MISSED_ISOC:
>> +			dwc3_gadget_giveback(dep, req, -EXDEV);
>> +			break;
>>  		default:
>>  			dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
>>  			dwc3_gadget_giveback(dep, req, -ECONNRESET);
>> @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
>>  	return ret;
>>  }
>> +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
>> +{
>> +	struct dwc3_request	*req;
>> +	struct dwc3_request	*tmp;
>> +	int ret = 0;
>> +
>> +	list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
>> +		struct dwc3_trb *trb;
>> +
>> +		trb = req->trb;
>> +		switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
>> +		case DWC3_TRBSTS_MISSED_ISOC:
>> +			/* Isoc endpoint only */
>> +			ret = -EXDEV;
>> +			break;
>> +		case DWC3_TRB_STS_XFER_IN_PROG:
>> +			/* Applicable when End Transfer with ForceRM=0 */
>> +		case DWC3_TRBSTS_SETUP_PENDING:
>> +			/* Control endpoint only */
>> +		case DWC3_TRBSTS_OK:
>> +		default:
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>  static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>  		const struct dwc3_event_depevt *event, int status)
>>  {
>> @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
>>  {
>>  	struct dwc3		*dwc = dep->dwc;
>>  	bool			no_started_trb = true;
>> +	unsigned int		transfer_in_flight = 0;
>> +
>> +	/* It is possible that the interrupt thread was delayed by
>> +	 * scheduling in the system, and therefor the HW has already
>> +	 * run dry. In that case the last trb in the queue is already
>> +	 * handled by the hw. By checking the HWO bit we know to restart
>> +	 * the whole transfer. The condition to appear is more likelely
>> +	 * if not every trb has the IOC bit set and therefor does not
>> +	 * trigger the interrupt thread fewer.
>> +	 */
>> +	if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
>> +		struct dwc3_trb *trb;
>> -	dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>> +		trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
>> +		transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
>> +	}
>> -	if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>> -		goto out;
>> +	if (status == -EXDEV || !transfer_in_flight) {
>> +		struct dwc3_request *tmp;
>> +		struct dwc3_request *req;
>> -	if (!dep->endpoint.desc)
>> -		return no_started_trb;
>> +		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
>> +			dwc3_stop_active_transfer(dep, true, true);
>> -	if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> -		list_empty(&dep->started_list) &&
>> -		(list_empty(&dep->pending_list) || status == -EXDEV))

@[!!here!!]

>> -		dwc3_stop_active_transfer(dep, true, true);
>> -	else if (dwc3_gadget_ep_should_continue(dep))
>> -		if (__dwc3_gadget_kick_transfer(dep) == 0)
>> -			no_started_trb = false;
>> +		list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
>> +			dwc3_gadget_move_cancelled_request(req,
>> +					(DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
>> +					DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
>> +		}
>> +	} else {
>> +		dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>> +
>> +		if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>> +			goto out;
>> +
>> +		if (!dep->endpoint.desc)
>> +			return no_started_trb;
>> +
>> +		if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> +			list_empty(&dep->started_list) && list_empty(&dep->pending_list))
>> +			dwc3_stop_active_transfer(dep, true, true);
>> +		else if (dwc3_gadget_ep_should_continue(dep))
>> +			if (__dwc3_gadget_kick_transfer(dep) == 0)
>> +				no_started_trb = false;
>> +	}
>>  out:
>>  	/*
>>
>> I will seperate the whole hunk into smaller changes and send an v1
>> the next days to review.
>>

I finally send a v1 of my series.

https://lore.kernel.org/linux-usb/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/

For the rest of the discussion, I would like to move the conversation to
the newly send series.

>No, we should not reschedule for every missed-isoc. We only want to
>target underrun condition.

As you stated above, with reschedule what you mean is calling
stop_transfer after a missed transfer was seen?

If so, why is this condition in there already? (@[!!here!!])

Michael

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ