[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b8abec0-c422-45f8-a8d6-998f33080373@rowland.harvard.edu>
Date: Sat, 2 Sep 2023 11:15:04 -0400
From: Alan Stern <stern@...land.harvard.edu>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc: Andrey Konovalov <andreyknvl@...il.com>,
Felipe Balbi <balbi@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
USB list <linux-usb@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: dwc3: unusual handling of setup requests with wLength == 0
On Fri, Sep 01, 2023 at 09:14:19PM +0000, Thinh Nguyen wrote:
> There's no Task Management transfer. All the packets are captured, but
> it's not easy to show in a single email as I need to manually export
> which view to see. Some view level (such as SCSI) would group the
> transfers, which may not show the entire picture. But it helps with the
> high level view. If there's Task Management command, then you would see
> it from the SCSI level.
Okay.
> Actually, something else was probably happened here.
>
> Some context:
> * Bulk OUT endp 4 is for command endpoint
> * Bulk IN endp 1 is for data endpoint
> * Bulk IN endp 3 is for status endpoint
>
> _______|_______________________________________________________________________
> Transfer(261) Left("Left") G2(x1) Bulk(OUT) ADDR(3) ENDP(4) UASP Command IU
> _______| Tag(0x0002) Logical Unit Number(0x 0000 0000 0000 0000) SCSI CDB
> _______| READ(10) Time(184.510 us) Time Stamp(10 . 005 791 128)
> _______|_______________________________________________________________________
> Transfer(264) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(1) Stream ID(0x0002)
> _______| UASP Data IN Bytes Transferred(524288) Time(945.654 us)
> _______| Time Stamp(10 . 005 975 638)
> _______|_______________________________________________________________________
> Transfer(266) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(3) Stream ID(0x0002)
> _______| UASP Sense IU Tag(0x0002) STATUS(GOOD) Time( 47.370 us)
> _______| Time Stamp(10 . 006 921 292)
> _______|_______________________________________________________________________
> Transfer(267) Left("Left") G2(x1) Bulk(OUT) ADDR(3) ENDP(4) UASP Command IU
> _______| Tag(0x0002) Logical Unit Number(0x 0000 0000 0000 0000) SCSI CDB
> _______| READ(10) Time(161.454 us) Time Stamp(10 . 006 968 662)
> _______|_______________________________________________________________________
> Transfer(272) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(1) Stream ID(0x0002)
> _______| UASP Data IN Bytes Transferred(146432) Time( 2.519 ms)
> _______| Time Stamp(10 . 007 130 116)
> _______|_______________________________________________________________________
>
>
> ## Transaction error occurs, the transfer stopped short at 146432 bytes
> ## instead of 512K.
>
>
> Transfer(289) Left("Left") G2(x1) Control(SET) ADDR(3) ENDP(0)
> _______| bRequest(CLEAR_FEATURE) wValue(ENDPOINT_HALT) wLength(0)
> _______| Time(166.322 us) Time Stamp(10 . 009 649 516)
> _______|_______________________________________________________________________
>
>
> # Host issues CLEAR_FEATURE(halt_ep)
We don't really know how the device responds to this request, other
than resetting the endpoint's sequence number to 0.
> Transfer(291) Left("Left") G2(x1) Bulk(OUT) ADDR(3) ENDP(4) UASP Command IU
> _______| Tag(0x0002) Logical Unit Number(0x 0000 0000 0000 0000) SCSI CDB
> _______| READ(10) Time(158.848 us) Time Stamp(10 . 009 815 838)
> _______|_______________________________________________________________________
>
>
> # Host completely dropped the SCSI command with transaction error. It
> # doesn't request for the status. Since it's dropped, tag 2 is
> # available. Now, a new SCSI command can use Tag 2.
That is what this looks like. I'm not aware of any place in the UAS
spec where it says that a host is allowed to do this.
It would be very interesting to know at this point if the host tried to
read more data on the data endpoint during the 160-microsecond period
before transfer 292. Does the full log show whether this happened?
Does the fact that the transfers have consecutive numbers mean that
nothing happened on the bus during this time? Was the device waiting
for the host before sending more data, or did it cancel the data that
was already queued?
> Transfer(292) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(3) Stream ID(0x0002)
> _______| UASP Response IU Tag(0x0002) RESPONSE_CODE(OVERLAPPED TAG)
> _______| Time(207.006 us) Time Stamp(10 . 009 974 686)
> _______|_______________________________________________________________________
>
>
> # Thinking that the previous Tag 2 command was still active and
> # responded with an OVERLAPPED TAG. This probably causes the gadget to
> # cancel the transfer and drop the command so it can be in sync again.
Indeed. If the Clear-Halt had caused the device to abort the ongoing
transfer, would it have responded this way? Wouldn't it have thought
that the previous Tag 2 command was completely finished? Or would it
have cancelled just the data part of the transfer, while keeping track
of the fact that the status part still needed to be sent?
To put it another way, it seems that the OVERLAPPED TAG response was
caused by the fact that the status was never sent. So whether or not
the device responded to the Clear-Halt by cancelling anything, it
became aware that something was wrong when it received this duplicate
tag.
> Transfer(295) Left("Left") G2(x1) Bulk(OUT) ADDR(3) ENDP(4) UASP Command IU
> _______| Tag(0x0002) Logical Unit Number(0x 0000 0000 0000 0000) SCSI CDB
> _______| READ(10) Time(884.218 ms) Time Stamp(10 . 010 181 692)
> _______|_______________________________________________________________________
> Transfer(313) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(1) Stream ID(0x0002)
> _______| UASP Data IN Bytes Transferred(524288) Time(554.648 us)
> _______| Time Stamp(10 . 894 399 886)
> _______|_______________________________________________________________________
> Transfer(314) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(3) Stream ID(0x0002)
> _______| UASP Sense IU Tag(0x0002) STATUS(GOOD) Time(127.515 ms)
> _______| Time Stamp(10 . 894 954 534)
> _______|_______________________________________________________________________
> Transfer(315) Left("Left") G2(x1) Bulk(OUT) ADDR(3) ENDP(4) UASP Command IU
> _______| Tag(0x0002) Logical Unit Number(0x 0000 0000 0000 0000) SCSI CDB
> _______| READ(10) Time(278.770 us) Time Stamp(11 . 022 469 104)
> _______|_______________________________________________________________________
> Transfer(316) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(1) Stream ID(0x0002)
> _______| UASP Data IN Bytes Transferred(524288) Time(554.236 us)
> _______| Time Stamp(11 . 022 747 874)
> _______|_______________________________________________________________________
> Transfer(317) Left("Left") G2(x1) Bulk(IN) ADDR(3) ENDP(3) Stream ID(0x0002)
> _______| UASP Sense IU Tag(0x0002) STATUS(GOOD) Time( 49.406 us)
> _______| Time Stamp(11 . 023 302 110)
> _______|_______________________________________________________________________
>
>
> So, for this scenario, the host was probably stay in sync with the
> device due to the overlap command tag id canceling the transfer. I'm not
> sure if this is the intent of the Windows UASP class driver, which seems
> like a non-conventional way of synchronization. Perhaps it was done
> because some devices may not support TASK_MANAGEMENT(Abort_task)?
That's possible. Windows seems to use a non-conventional approach to
many things.
> Regardless, if the host resets the endpoint, the transfer must be
> canceled otherwise we risk data corruption.
Do we? What would have happened if the transfer had not been
cancelled? The device might have sent some data left over from the
original command and the host might have misinterpreted it as belong to
the new command. But when the device sent the OVERLAPPED TAG response,
the host would have realized that any data it received for the
new command was invalid and abandoned the command.
In fact, that's what it did do in transfer 295. As far as I can tell
from the log, the Clear-Halt didn't cause the device to fully cancel
the transfer.
> Also whenever there's a OVERLAPPED tag error, Windows host takes a long
> time (~1 sec) to send a new command (check delta time of Transfer 295
> and 313). If the gadget driver can base off of the
> CLEAR_FEATURE(halt_ep), this improves performance.
Okay, that's a good point.
However, since not cancelling the transfer apparently did not lead to
corruption, I think it's okay to allow UDC drivers not to cancel
requests when they receive a Clear-Halt. This decision could be left
up to the driver.
Alan Stern
Powered by blists - more mailing lists