lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <329f68fb-a097-ff3d-da9d-f535a8429ea7@synopsys.com>
Date: Tue, 13 May 2025 07:35:40 +0000
From: Minas Harutyunyan <Minas.Harutyunyan@...opsys.com>
To: Luca Ceresoli <luca.ceresoli@...tlin.com>,
        Alan Stern
	<stern@...land.harvard.edu>
CC: "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        Kever Yang
	<kever.yang@...k-chips.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Hervé Codina <herve.codina@...tlin.com>,
        Thomas Petazzoni
	<thomas.petazzoni@...tlin.com>,
        Stefan Wahren <wahrenst@....net>,
        Fabrice
 Gasnier <fabrice.gasnier@...s.st.com>
Subject: Re: DWC2 gadget: unexpected device reenumeration on Rockchip RK3308

Hi Luca,


On 5/9/25 11:17, Luca Ceresoli wrote:
> Hello Alan, Minas,
> 
> Minas: I am reporting new relevant findings in this e-mail and have
> questions for you below.
> 
> On Fri, 2 May 2025 13:56:01 -0400
> Alan Stern <stern@...land.harvard.edu> wrote:
> 
>> On Fri, May 02, 2025 at 03:53:08PM +0200, Luca Ceresoli wrote:
>>> Hello Alan,
>>>
>>> thanks for your continued support!
>>>
>>> On Tue, 15 Apr 2025 12:14:58 -0400
>>> Alan Stern <stern@...land.harvard.edu> wrote:
>>>
>>> [...]
>>>    
>>>>>>> It's quite possible that you're getting messed up by link power
>>>>>>> management (LPM).  But that's just a guess.
>>>>>
>>>>> What would be a symptom, if that happened?
>>>>
>>>> The debugging log wouldn't show much unless something went wrong.  You
>>>> could see if there are any files containing "lpm" in their names in the
>>>> /sys/bus/usb/devices/3-3.4/ directory (while the device is connected)
>>>> and what they contain.  Also, there's a way to disable LPM on the host
>>>> by setting a usbcore quirks module parameter:
>>>>
>>>> 	echo 1209:0001:k >/sys/module/usbcore/parameters/quirks
>>>
>>> Tried this. There is no file with 'lpm' in the name in
>>> /sys/bus/usb/devices/3-3.4/, and adding the quirk did not change the
>>> result: still a disconnect and reconnect in ~6 seconds.
>>
>> Okay, so LPM doesn't seem to be the reason.
> 
> I see, thanks for checking.
> 
>>>> You could also try connecting a usbmon trace for bus 3, showing what
>>>> happens during the initial connection and ensuing disconnection.  Any
>>>> LPM transitions would show up in the trace.
>>>
>>> Tried this, and here are the few lines before and after the 5~6 seconds
>>> delay.
>>>
>>> ffff99621e768840 4009009102 C Bi:1:009:3 0 2 = 696e
>>> ffff99621e768840 4009009104 S Bi:1:009:3 -115 256 <
>>> ffff99621e768300 4009009115 S Bi:1:009:3 -115 256 <
>>> ffff99621e768840 4009009144 C Bi:1:009:3 0 6 = 3a383534 2033
>>> ffff99621e768300 4009009155 C Bi:1:009:3 0 1 = 37
>>> ffff99621e768840 4009009178 C Bi:1:009:3 0 2 = 0d0a
>>> ffff99621e768840 4009009180 S Bi:1:009:3 -115 256 <
>>> ffff996080f11900 4009009361 C Ci:1:014:0 0 26 = 1a034300 44004300 20004100 43004d00 20004400 61007400 6100
>>> ffff99621e768300 4009009615 C Bi:1:009:3 0 3 = 5b2020
>>> ffff99621e768300 4009009624 S Bi:1:009:3 -115 256 <
>>> ffff99621e768840 4009009645 C Bi:1:009:3 0 3 = 203233
>>> ffff99621e768840 4009009646 S Bi:1:009:3 -115 256 <
>>> ffff99621e768300 4009009692 C Bi:1:009:3 0 4 = 2e383738
>>> ffff99621e768300 4009009694 S Bi:1:009:3 -115 256 <
>>> ffff99621e768840 4009009703 C Bi:1:009:3 0 2 = 3731
>>> ffff99621e768840 4009009722 S Bi:1:009:3 -115 256 <
>>> ffff99621e768840 4009009933 C Bi:1:009:3 0 2 = 7472
>>
>> It looks like device 9 (the lines containing :1:009:3) and device 14 are
>> unrelated to the problem; neither of them is your DWC2 device.
> 
> That's probably because I ha connected an entire USB HUB to the laptop,
> which had in turn a USB-serial adapter to access the console on the
> board headers. I understand this creates more noise, so I changed my
> setup later on to only connect the relevant cable.
> 
>>> <<< 6 seconds delay >>>
>>>
>>> ffff9960828e9540 4014796128 C Ii:1:001:1 0:2048 2 = 1000
>>> ffff9960828e9540 4014796145 S Ii:1:001:1 -115:2048 4 <
>>> ffff996080f11900 4014796162 S Ci:1:001:0 s a3 00 0000 0004 0004 4 <
>>> ffff996080f11900 4014796189 C Ci:1:001:0 0 4 = 00010100
>>
>> This shows the host system receiving a disconnect notification (for port
>> 4) from the hardware.  Which is odd, because earlier you said the device
>> was 3-3.4, indicating that it was plugged into a hub, not directly into
>> the host controller.  But the notification here comes from the host
>> controller.
>>
>> On the other hand, an even earlier email said that the device was 3-2,
>> indicating it _was_ plugged directly into the host controller
>>
>> Which means you've been changing your setup while running these tests.
>> Not a good idea.
> 
> I had to change laptop because reading usbmon debugfs files is not
> working on my main laptop. I still haven't figured out the reason, but
> on the other laptop it works, but unavoidably it changes the bus
> number. Sorry about that.
> 
>>> ffff996080f11900 4014796201 S Co:1:001:0 s 23 01 0010 0004 0000 0
>>> ffff996080f11900 4014796219 C Co:1:001:0 0 0
>>> ffff996080f11000 4014799627 S Ci:1:001:0 s a3 00 0000 0004 0004 4 <
>>> ffff996080f11000 4014799679 C Ci:1:001:0 0 4 = 00010000
>>> ffff996080f11000 4014826132 S Ci:1:001:0 s a3 00 0000 0004 0004 4 <
>>> ffff996080f11000 4014826166 C Ci:1:001:0 0 4 = 00010000
>>> ffff996080f11000 4014852075 S Ci:1:001:0 s a3 00 0000 0004 0004 4 <
>>> ffff996080f11000 4014852122 C Ci:1:001:0 0 4 = 00010000
>>> ffff996080f11000 4014878210 S Ci:1:001:0 s a3 00 0000 0004 0004 4 <
>>> ffff996080f11000 4014878253 C Ci:1:001:0 0 4 = 00010000
>>> ffff996080f11000 4014904049 S Ci:1:001:0 s a3 00 0000 0004 0004 4 <
>>> ffff996080f11000 4014904088 C Ci:1:001:0 0 4 = 00010000
>>> ffff9960828e9540 4014948427 C Ii:1:001:1 0:2048 2 = 1000
>>> ffff9960828e9540 4014948456 S Ii:1:001:1 -115:2048 4 <
>>> ffff99621e768300 4014948461 C Bi:1:009:3 0 2 = 5b20
>>> ffff99621e768300 4014948472 S Bi:1:009:3 -115 256 <
>>> ffff99621e768840 4014948488 C Bi:1:009:3 0 2 = 2020
>>> ffff99621e768840 4014948489 S Bi:1:009:3 -115 256 <
>>> ffff996080f11000 4014948522 S Ci:1:001:0 s a3 00 0000 0004 0004 4 <
>>> ffff99621e768300 4014948545 C Bi:1:009:3 0 58 = 32392e38 31373337 325d203e 3e3e2064 7763325f 68616e64 6c655f63 6f6d6d6f
>>> ffff99621e768300 4014948560 S Bi:1:009:3 -115 256 <
>>> ffff996080f11000 4014948607 C Ci:1:001:0 0 4 = 01010100
>>
>> And then about 150 ms later (the second column of the log is a
>> timestamp, in microseconds), a connection notification.  Nothing
>> preceding the disconnect to explain what caused it.
>>
>>> ffff99621e768840 4014948639 C Bi:1:009:3 0 10 = 37395d20 3e3e3e20 6477
>>> ffff99621e768840 4014948644 S Bi:1:009:3 -115 256 <
>>> ffff99621e768300 4014948657 C Bi:1:009:3 0 3 = 63325f
>>> ffff99621e768300 4014948663 S Bi:1:009:3 -115 256 <
>>> ffff99621e768840 4014948689 C Bi:1:009:3 0 5 = 68736f74 67
>>> ffff99621e768840 4014948693 S Bi:1:009:3 -115 256 <
>>> ffff99621e768300 4014948718 C Bi:1:009:3 0 2 = 5f69
>>> ffff99621e768300 4014948720 S Bi:1:009:3 -115 256 <
>>> ffff99621e768840 4014948759 C Bi:1:009:3 0 4 = 72713a33
>>
>> Unrelated material.  Evidently device 9 is running some sort of
>> serial connection, because everything it sends looks like ASCII
>> characters.
> 
> Perhaps the USB-serial I mentioned above, to access the board console.
> 
>>> However IIUC both the usbmon debugfs interface and Wireshark are unable
>>> to capture disconnection events because that's handled by the hardware.
>>> Correct?
>>
>> I'm not quite sure how to answer.  Yes, the hardware handles
>> disconnections -- because the hardware handles _everything_ that happens
>> on the USB bus.  And one of the things the hardware does when handling
>> disconnections is to tell the driver that one occurred; that's why the
>> report shows up in the usbmon (or Wireshark) trace.
>>
>> A USB analyzer could tell you exactly what's happening on the wire, but
>> they are expensive.  And in this case, I think all it would tell you is
>> what we already know: that a disconnect happened.
>>
>> The fact that the disconnects don't happen with the vendor kernel
>> indicates that they aren't caused by a hardware problem, such as a bad
>> cable link, but rather by something in the device's software, i.e., the
>> dwc2 driver.
>>
>> I don't know anything about that driver, though.  Minas is the expert.
>> You really need his advice.
> 
> In the meanwhile I did two event captures, one with the mainline kernel
> and one with the vendor kernel, using the same laptop setup and no hub
> in between, and for each test I captured both the usbmon log and a
> wireshark file. Both are available if needed.
> 
> By analyzing those captures I found that the communication between host
> and gadget is almost identical. The only differenceis the get
> configuration descriptor response has one more descriptor in the vendor
> case (the working one). Here it is:
> 
> OTG Descriptor:
>    bLength                 3
>    bDescriptorType         9
>    bmAttributes         0x03
>      SRP (Session Request Protocol)
>      HNP (Host Negotiation Protocol)
> 
> I don't know exacty what that implies, but for a quick test I went in
> the mainline kernel and found that it can add the same descriptor if
> both of these is true:
> 
>   * dr_mode = "otg" in device tree
>   * "DWC2 Mode Selection" is "Dual role mode" in kconfig
>     (i.e. CONFIG_USB_DWC2_DUAL_ROLE=y)
> 
> While I had:
> 
>   * dr_mode = "peripheral"
>   * "DWC2 Mode Selection" = "Gadget only mode"
>     (i.e. CONFIG_USB_DWC2_PERIPHERAL=y)
> 
> With those two changes the mainline kernel now behaves correctly, just
> like the vendor kernel. No more disconnection after 5-6 seconds.
> 
> For the records, the vendor kernel already had dr_mode = "otg" and
> CONFIG_USB_DWC2_DUAL_ROLE=y.
> 
> Based on my very limited knowledge of USB, intuitively it looks that:
> 
>   * in peripheral-only mode the OTG Descriptor should not be sent
>   * in peripheral-only mode SRP does not make sense
>   * in peripheral-only mode HNP does not make sense
> 
> Are the above correct?
> 
> Whether the answer, I think these new findings do not yet explain the
> problem nor point to a correct solution. Apart from the added
> descriptor, all of the initial enumeration events seen by usbmon is
> identical in the two cases.
> 
> Minas, were you able to have a look at the info I collected?
> Do they suggesting you anything about the dwc2 driver?
> 
Configuration parameters: CONFIG_USB_DWC2_HOST, 
CONFIG_USB_DWC2_PERIPHERAL and CONFIG_USB_DWC2_DUAL_ROLE have impact 
only on build process. Based on these parameters driver can build as 
host only, device only or host + device.

OTG functionality of depend on:
1. On core configuration - GHWCFG2 bits 0:2:
Mode of Operation (OtgMode)
3'b000: HNP- and SRP-Capable OTG (Host & Device)
3'b001: SRP-Capable OTG (Host & Device)
3'b010: Non-HNP and Non-SRP Capable OTG (Host and Device)
3'b011: SRP-Capable Device
3'b100: Non-OTG Device
3'b101: SRP-Capable Host
3'b110: Non-OTG Host
Others: Reserved
As you can see above, device only mode can support OTG, i.e. 
"SRP-capable device".
Based on provided OTG descriptor your core's OTG mode is equal to 0, 
which means "HNP- and SRP-Capable OTG (Host & Device)".
2. Depend on platform (see dwc2/param.c) OTG functionality can be 
updated, if it allowed by above core configuration OTG parameter.
3. OTG functionality can updated also through devicetree parameters 
settings.

Thanks,
Minas

> Best regards,
> Luca
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ