lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230901012322.rwpj32rx36xjtlb6@synopsys.com>
Date:   Fri, 1 Sep 2023 01:27:34 +0000
From:   Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To:     Alan Stern <stern@...land.harvard.edu>
CC:     Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Felipe Balbi <balbi@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        USB list <linux-usb@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: dwc3: unusual handling of setup requests with wLength == 0

On Thu, Aug 31, 2023, Alan Stern wrote:
> On Thu, Aug 31, 2023 at 02:43:51AM +0000, Thinh Nguyen wrote:
> > On Wed, Aug 30, 2023, Alan Stern wrote:
> > > On Wed, Aug 30, 2023 at 01:32:28AM +0000, Thinh Nguyen wrote:
> > > > That reminds me another thing, if the host (xhci in this case) does a
> > > > hard reset to the endpoint, it also resets the TRB pointer with dequeue
> > > > ep command. So, the transfer should not resume. It needs to be
> > > > cancelled. This xHCI behavior is the same for Windows and Linux.
> > > 
> > > That's on the host side, right?  How does this affect the gadget side?
> > > 
> > > That is, cancelling a transfer on the host doesn't necessarily mean it 
> > > has to be cancelled on the gadget.  Does it have any implications at all 
> > > for the gadget driver?
> > 
> > There are 2 things that needs to be in sync'ed between host and device:
> > 1) The data sequence.
> 
> You mean the USB-3 sequence number value?

Yes.

> 
> > 2) The transfer.
> > 
> > If host doesn't send CLEAR_FEATURE(halt_ep), best case scenario, the
> > data sequence does't match and the host issues usb reset after some
> > timeout because the packet won't go through.
> 
> The data toggles in USB-2, which are analogous to the sequence numbers 
> in USB-3, don't work the same way.  When a USB-2 controller receives a 
> data packet with the wrong sequence number, it sends an ACK response but 
> otherwise ignores it.  This prevents timeouts (but not other types of 
> errors).
> 
> >  Worst case scenario, the
> > data sequence matches 0, and the wrong data is received causing
> > corruption.
> > 
> > If the device doesn't cancel the transfer in response to
> > CLEAR_FEATURE(halt_ep), it may send/receive data of a different transfer
> > because the host doesn't resume where it left off, causing corruption.
> > 
> > Base on the class protocol, the class driver and gadget driver know
> > what makes up a "transfer" and can appropriately cancel a transfer to
> > stay in sync.
> 
> You're still thinking of UAS in particular, right?  What I would expect 
> to happen when there's a transaction error in a UAS data transfer, based 
> on reading the UAS spec, is that the host would cancel the transfer on 
> its side and send either an Abort Task or an I_T Nexus Reset task 
> management request to the device (in addition to resetting the host 
> endpoint and sending a Clear-Halt).  I would not expect the host to hope 
> that the device would abandon the transfer merely because it got the 
> Clear-Halt.
> 
> Does Windows really work this way?  Does it not send a task management 
> request?  That would definitely seem to be against the intent of the 
> spec, if not against the letter.

Unfortunately yes, I don't see any Task Management request aborting the
transfer.

> 
> > > How does the gadget driver sync with the host if the class protocol 
> > > doesn't say what should be done?
> > > 
> > > Also, what if there is no active transfer?  That is, what if the 
> > > transaction that got an error on the host appeared to be successful on 
> > > the gadget and it was the last transaction in the final transfer queued 
> > > for the endpoint?  How would the UDC driver notify the gadget driver in 
> > > this situation?
> > 
> > That's fine. If there's no active transfer, the gadget doesn't need to
> > cancel anything. As long as the host knows that the transfer did not
> > complete, it can retry and be in sync. For UASP, the host will send a
> > new MSC command to retry the failed transfer. ie. The host would
> > overwrite/re-read the transfer with the same transfer offset.
> > 
> > The problem arises if the gadget attempts to resume the incomplete
> > transfer.
> 
> Quite so.  But would the host send a new MSC retry command before the 
> failed command completes?

The host sends a new MSC command after the incomplete command failed.

> 
> > > >  This is observed in
> > > > UASP driver in Windows and how various consumer UASP devices handle it.
> > > 
> > > I don't understand what you're saying here.  How can you observe whether 
> > > a transfer is cancelled in a consumer UAS device?  And how does the 
> > > consumer device resync with the host?
> > 
> > You can see a hang if the transfer are out of sync. If the transfer
> > isn't cancelled, the device would only source/sink whatever the
> > remaining of the previous transfer but not enough to complete the new
> > transfer. The new transfer is seen as incomplete from host and thus the
> > hang and the usb reset.
> > 
> > > 
> > > > There no eqivalent of Bulk-Only Mass Storage Reset request from the
> > > > class protocol. We still have the USB analyzer traces for this.
> > > 
> > > Can you post an example?  Not necessarily in complete detail, but enough 
> > > so that we can see what's going on.
> > > 
> > > > Regardless whether the class protocol spells out how to handle the
> > > > transaction error, if there's transaction error, the host may send
> > > > CLEAR_FEATURE(halt_ep) as observed in Windows. The gadget driver needs
> > > > to know about it to cancel the active transfer and resync with the host.
> > > 
> > > I'll be able to understand this better after seeing an example.  Do you 
> > > have any traces that were made for a High-speed connection (say, using 
> > > a USB-2 cable)?  It would probably be easier to follow than a SuperSpeed 
> > > example.
> > > 
> > 
> > Unfortunately I only have LeCroy usb analyzer traces of Gen 2x1, not for
> > usb2 speed. It's a bit tricky converting it to text with all the proper
> > info to see all the context. If my explanation isn't clear, I'll try to
> > figure out how to proceed.
> 
> I would appreciate seeing whatever you can provide.
> 

Here's a snippet captured at the SCSI level from Samsung T7 device
response to CLEAR_FEATURE(halt-ep) to IN data endpoint from host
(Windows 10). Similar behavior is observed for OUT endpoint.


_______|_______________________________________________________________________
SCSI Op(80) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x0928E800) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.335 ms) Time Stamp(10 . 000 538 006) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(81) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x0928EC00) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.318 ms) Time Stamp(10 . 001 872 988) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(82) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x0928F000) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.343 ms) Time Stamp(10 . 003 191 188) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(83) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x0928F400) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.256 ms) Time Stamp(10 . 004 534 630) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(84) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x0928F800) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.178 ms) Time Stamp(10 . 005 791 128) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(85) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x0928FC00) Data(146432 bytes) Status(Missing)-BAD 
_______| Time(  2.681 ms) Time Stamp(10 . 006 968 662) Metrics #Xfers(2) 
_______|_______________________________________________________________________


## Transaction eror occurs here.

Transfer(289) Left("Left") G2(x1) Control(SET) ADDR(3) ENDP(0) 
_______| bRequest(CLEAR_FEATURE) wValue(ENDPOINT_HALT) wLength(0) 
_______| Time(166.322 us) Time Stamp(10 . 009 649 516) 
_______|_______________________________________________________________________

## CLEAR_FEATURE happens here.

SCSI Op(99) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x09290000) RESPONSE_CODE(OVERLAPPED TAG) 
_______| Time(365.854 us) Time Stamp(10 . 009 815 838) Metrics #Xfers(2) 
_______|_______________________________________________________________________
SCSI Op(100) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x09290400) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.012 sec) Time Stamp(10 . 010 181 692) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(101) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x0928FC00) STATUS(GOOD) Data(524288 bytes) 
_______| Time(882.412 us) Time Stamp(11 . 022 469 104) Metrics #Xfers(3) 
_______|_______________________________________________________________________

## Host retries transfer here. Check logical block address.

SCSI Op(102) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x09290000) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.060 ms) Time Stamp(11 . 023 351 516) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(103) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x09290800) STATUS(GOOD) Data(524288 bytes) 
_______| Time(  1.013 ms) Time Stamp(11 . 024 411 510) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(104) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x09290C00) STATUS(GOOD) Data(524288 bytes) 
_______| Time(816.594 us) Time Stamp(11 . 025 424 600) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(105) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x09291000) STATUS(GOOD) Data(524288 bytes) 
_______| Time(762.286 us) Time Stamp(11 . 026 241 194) Metrics #Xfers(3) 
_______|_______________________________________________________________________
SCSI Op(106) ADDR(3) Tag(0x0002) SCSI CDB READ(10) 
_______| Logical Block Addr(0x09291400) STATUS(GOOD) Data(524288 bytes) 
_______| Time(768.696 us) Time Stamp(11 . 027 003 480) Metrics #Xfers(3) 
_______|_______________________________________________________________________



BR,
Thinh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ