lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1e7939d-f679-9229-c1b2-27e8087bf52a@gmail.com>
Date:   Tue, 15 Jun 2021 21:53:01 +0200
From:   Ferry Toth <fntoth@...il.com>
To:     Wesley Cheng <wcheng@...eaurora.org>,
        Andy Shevchenko <andy.shevchenko@...il.com>,
        Heikki Krogerus <heikki.krogerus@...ux.intel.com>
Cc:     Felipe Balbi <balbi@...nel.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Andy Gross <agross@...nel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        USB <linux-usb@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arm-msm@...r.kernel.org,
        devicetree <devicetree@...r.kernel.org>,
        Jack Pham <jackp@...eaurora.org>,
        Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        John Youn <John.Youn@...opsys.com>
Subject: Re: [PATCH v9 0/5] Re-introduce TX FIFO resize for larger EP bursting

Hi

Op 15-06-2021 om 06:22 schreef Wesley Cheng:
>
> On 6/14/2021 12:30 PM, Ferry Toth wrote:
>> Op 14-06-2021 om 20:58 schreef Wesley Cheng:
>>> On 6/12/2021 2:27 PM, Ferry Toth wrote:
>>>> Hi
>>>>
>>>> Op 11-06-2021 om 15:21 schreef Andy Shevchenko:
>>>>> On Fri, Jun 11, 2021 at 4:14 PM Heikki Krogerus
>>>>> <heikki.krogerus@...ux.intel.com> wrote:
>>>>>> On Fri, Jun 11, 2021 at 04:00:38PM +0300, Felipe Balbi wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Wesley Cheng <wcheng@...eaurora.org> writes:
>>>>>>>>>>>>>> to be honest, I don't think these should go in (apart from
>>>>>>>>>>>>>> the build
>>>>>>>>>>>>>> failure) because it's likely to break instantiations of the
>>>>>>>>>>>>>> core with
>>>>>>>>>>>>>> differing FIFO sizes. Some instantiations even have some
>>>>>>>>>>>>>> endpoints with
>>>>>>>>>>>>>> dedicated functionality that requires the default FIFO size
>>>>>>>>>>>>>> configured
>>>>>>>>>>>>>> during coreConsultant instantiation. I know of at OMAP5 and
>>>>>>>>>>>>>> some Intel
>>>>>>>>>>>>>> implementations which have dedicated endpoints for processor
>>>>>>>>>>>>>> tracing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With OMAP5, these endpoints are configured at the top of the
>>>>>>>>>>>>>> available
>>>>>>>>>>>>>> endpoints, which means that if a gadget driver gets loaded
>>>>>>>>>>>>>> and takes
>>>>>>>>>>>>>> over most of the FIFO space because of this resizing,
>>>>>>>>>>>>>> processor tracing
>>>>>>>>>>>>>> will have a hard time running. That being said, processor
>>>>>>>>>>>>>> tracing isn't
>>>>>>>>>>>>>> supported in upstream at this moment.
>>>>>>>>>>>>>>
>>>>>>>>>>>> I agree that the application of this logic may differ between
>>>>>>>>>>>> vendors,
>>>>>>>>>>>> hence why I wanted to keep this controllable by the DT
>>>>>>>>>>>> property, so that
>>>>>>>>>>>> for those which do not support this use case can leave it
>>>>>>>>>>>> disabled.  The
>>>>>>>>>>>> logic is there to ensure that for a given USB configuration,
>>>>>>>>>>>> for each EP
>>>>>>>>>>>> it would have at least 1 TX FIFO.  For USB configurations which
>>>>>>>>>>>> don't
>>>>>>>>>>>> utilize all available IN EPs, it would allow re-allocation of
>>>>>>>>>>>> internal
>>>>>>>>>>>> memory to EPs which will actually be in use.
>>>>>>>>>>> The feature ends up being all-or-nothing, then :-) It sounds
>>>>>>>>>>> like we can
>>>>>>>>>>> be a little nicer in this regard.
>>>>>>>>>>>
>>>>>>>>>> Don't get me wrong, I think once those features become available
>>>>>>>>>> upstream, we can improve the logic.  From what I remember when
>>>>>>>>>> looking
>>>>>>>>> sure, I support that. But I want to make sure the first cut isn't
>>>>>>>>> likely
>>>>>>>>> to break things left and right :)
>>>>>>>>>
>>>>>>>>> Hence, let's at least get more testing.
>>>>>>>>>
>>>>>>>> Sure, I'd hope that the other users of DWC3 will also see some
>>>>>>>> pretty
>>>>>>>> big improvements on the TX path with this.
>>>>>>> fingers crossed
>>>>>>>
>>>>>>>>>> at Andy Shevchenko's Github, the Intel tracer downstream changes
>>>>>>>>>> were
>>>>>>>>>> just to remove physical EP1 and 2 from the DWC3 endpoint list.
>>>>>>>>>> If that
>>>>>>>>> right, that's the reason why we introduced the endpoint feature
>>>>>>>>> flags. The end goal was that the UDC would be able to have custom
>>>>>>>>> feature flags paired with ->validate_endpoint() or whatever before
>>>>>>>>> allowing it to be enabled. Then the UDC driver could tell UDC
>>>>>>>>> core to
>>>>>>>>> skip that endpoint on that particular platform without
>>>>>>>>> interefering with
>>>>>>>>> everything else.
>>>>>>>>>
>>>>>>>>> Of course, we still need to figure out a way to abstract the
>>>>>>>>> different
>>>>>>>>> dwc3 instantiations.
>>>>>>>>>
>>>>>>>>>> was the change which ended up upstream for the Intel tracer
>>>>>>>>>> then we
>>>>>>>>>> could improve the logic to avoid re-sizing those particular EPs.
>>>>>>>>> The problem then, just as I mentioned in the previous paragraph,
>>>>>>>>> will be
>>>>>>>>> coming up with a solution that's elegant and works for all
>>>>>>>>> different
>>>>>>>>> instantiations of dwc3 (or musb, cdns3, etc).
>>>>>>>>>
>>>>>>>> Well, at least for the TX FIFO resizing logic, we'd only be
>>>>>>>> needing to
>>>>>>>> focus on the DWC3 implementation.
>>>>>>>>
>>>>>>>> You bring up another good topic that I'll eventually needing to be
>>>>>>>> taking a look at, which is a nice way we can handle vendor specific
>>>>>>>> endpoints and how they can co-exist with other "normal"
>>>>>>>> endpoints.  We
>>>>>>>> have a few special HW eps as well, which we try to maintain
>>>>>>>> separately
>>>>>>>> in our DWC3 vendor driver, but it isn't the most convenient, or most
>>>>>>>> pretty method :).
>>>>>>> Awesome, as mentioned, the endpoint feature flags were added
>>>>>>> exactly to
>>>>>>> allow for these vendor-specific features :-)
>>>>>>>
>>>>>>> I'm more than happy to help testing now that I finally got our SM8150
>>>>>>> Surface Duo device tree accepted by Bjorn ;-)
>>>>>>>
>>>>>>>>>> However, I'm not sure how the changes would look like in the end,
>>>>>>>>>> so I
>>>>>>>>>> would like to wait later down the line to include that :).
>>>>>>>>> Fair enough, I agree. Can we get some more testing of $subject,
>>>>>>>>> though?
>>>>>>>>> Did you test $subject with upstream too? Which gadget drivers
>>>>>>>>> did you
>>>>>>>>> use? How did you test
>>>>>>>>>
>>>>>>>> The results that I included in the cover page was tested with the
>>>>>>>> pure
>>>>>>>> upstream kernel on our device.  Below was using the ConfigFS gadget
>>>>>>>> w/ a
>>>>>>>> mass storage only composition.
>>>>>>>>
>>>>>>>> Test Parameters:
>>>>>>>>     - Platform: Qualcomm SM8150
>>>>>>>>     - bMaxBurst = 6
>>>>>>>>     - USB req size = 256kB
>>>>>>>>     - Num of USB reqs = 16
>>>>>>> do you mind testing with the regular request size (16KiB) and 250
>>>>>>> requests? I think we can even do 15 bursts in that case.
>>>>>>>
>>>>>>>>     - USB Speed = Super-Speed
>>>>>>>>     - Function Driver: Mass Storage (w/ ramdisk)
>>>>>>>>     - Test Application: CrystalDiskMark
>>>>>>>>
>>>>>>>> Results:
>>>>>>>>
>>>>>>>> TXFIFO Depth = 3 max packets
>>>>>>>>
>>>>>>>> Test Case | Data Size | AVG tput (in MB/s)
>>>>>>>> -------------------------------------------
>>>>>>>> Sequential|1 GB x     |
>>>>>>>> Read      |9 loops    | 193.60
>>>>>>>>              |           | 195.86
>>>>>>>>              |           | 184.77
>>>>>>>>              |           | 193.60
>>>>>>>> -------------------------------------------
>>>>>>>>
>>>>>>>> TXFIFO Depth = 6 max packets
>>>>>>>>
>>>>>>>> Test Case | Data Size | AVG tput (in MB/s)
>>>>>>>> -------------------------------------------
>>>>>>>> Sequential|1 GB x     |
>>>>>>>> Read      |9 loops    | 287.35
>>>>>>>>            |           | 304.94
>>>>>>>>              |           | 289.64
>>>>>>>>              |           | 293.61
>>>>>>> I remember getting close to 400MiB/sec with Intel platforms without
>>>>>>> resizing FIFOs and I'm sure the FIFO size was set to 2x1024,
>>>>>>> though my
>>>>>>> memory could be failing.
>>>>>>>
>>>>>>> Then again, I never ran with CrystalDiskMark, I was using my own tool
>>>>>>> (it's somewhere in github. If you care, I can look up the URL).
>>>>>>>
>>>>>>>> We also have internal numbers which have shown similar
>>>>>>>> improvements as
>>>>>>>> well.  Those are over networking/tethering interfaces, so testing
>>>>>>>> IPERF
>>>>>>>> loopback over TCP/UDP.
>>>>>>> loopback iperf? That would skip the wire, no?
>>>>>>>
>>>>>>>>>> size of 2 and TX threshold of 1, this would really be not
>>>>>>>>>> beneficial to
>>>>>>>>>> us, because we can only change the TX threshold to 2 at max,
>>>>>>>>>> and at
>>>>>>>>>> least in my observations, once we have to go out to system
>>>>>>>>>> memory to
>>>>>>>>>> fetch the next data packet, that latency takes enough time for the
>>>>>>>>>> controller to end the current burst.
>>>>>>>>> What I noticed with g_mass_storage is that we can amortize the
>>>>>>>>> cost of
>>>>>>>>> fetching data from memory, with a deeper request queue. Whenever I
>>>>>>>>> test(ed) g_mass_storage, I was doing so with 250 requests. And
>>>>>>>>> that was
>>>>>>>>> enough to give me very good performance. Never had to poke at TX
>>>>>>>>> FIFO
>>>>>>>>> resizing. Did you try something like this too?
>>>>>>>>>
>>>>>>>>> I feel that allocating more requests is a far simpler and more
>>>>>>>>> generic
>>>>>>>>> method that changing FIFO sizes :)
>>>>>>>>>
>>>>>>>> I wish I had a USB bus trace handy to show you, which would make it
>>>>>>>> very
>>>>>>>> clear how the USB bus is currently utilized with TXFIFO size 2 vs
>>>>>>>> 6.  So
>>>>>>>> by increasing the number of USB requests, that will help if there
>>>>>>>> was a
>>>>>>>> bottleneck at the SW level where the application/function driver
>>>>>>>> utilizing the DWC3 was submitting data much faster than the HW was
>>>>>>>> processing them.
>>>>>>>>
>>>>>>>> So yes, this method of increasing the # of USB reqs will definitely
>>>>>>>> help
>>>>>>>> with situations such as HSUSB or in SSUSB when EP bursting isn't
>>>>>>>> used.
>>>>>>>> The TXFIFO resize comes into play for SSUSB, which utilizes endpoint
>>>>>>>> bursting.
>>>>>>> Hmm, that's not what I remember. Perhaps the TRB cache size plays a
>>>>>>> role
>>>>>>> here too. I have clear memories of testing this very scenario of
>>>>>>> bursting (using g_mass_storage at the time) because I was curious
>>>>>>> about
>>>>>>> it. Back then, my tests showed no difference in behavior.
>>>>>>>
>>>>>>> It could be nice if Heikki could test Intel parts with and without
>>>>>>> your
>>>>>>> changes on g_mass_storage with 250 requests.
>>>>>> Andy, you have a system at hand that has the DWC3 block enabled,
>>>>>> right? Can you help out here?
>>>>> I'm not sure if i will have time soon, I Cc'ed to Ferry who has a few
>>>>> more test cases (I have only one or two) and maybe can help. But I'll
>>>>> keep this in mind.
>>>> I just tested on 5.13.0-rc4 on Intel Edison (x86_64). All 5 patches
>>>> apply. Switching between host/gadget works, no connections dropping, no
>>>> errors in dmesg.
>>>>
>>>> In host mode I connect a smsc9504 eth+4p hub. In gadget mode I have
>>>> composite device created from configfs with gser / eem / mass_storage /
>>>> uac2.
>>>>
>>>> Tested with iperf3 performance in host (93.6Mbits/sec) and gadget
>>>> (207Mbits/sec) mode. Compared to v5.10.41 without patches host
>>>> (93.4Mbits/sec) and gadget (198Mbits/sec).
>>>>
>>>> Gadget seems to be a little faster with the patches, but that might also
>>>> be caused  by something else, on v5.10.41 I see the bitrate bouncing
>>>> between 207 and 199.
>>>>
>>>> I saw a mention to test iperf3 to self (loopback). 3.09 Gbits/sec. With
>>>> v5.10.41 3.07Gbits/sec. Not bad for a 500MHz device.
>>>>
>>>> With gnome-disks I did a read access benchmark 35.4MB/s, with v5.10.41
>>>> 34.7MB/s. This might be limited by Edison's internal eMMC speed (when
>>>> booting U-Boot reads the kernel with 21.4 MiB/s).
>>>>
>>> Hi Ferry,
>>>
>>> Thanks for the testing.  Just to double check, did you also enable the
>>> property, which enabled the TXFIFO resize feature on the platform?  For
>>> example, for the QCOM SM8150 platform, we're adding the following to our
>>> device tree node:
>>>
>>> tx-fifo-resize
>>>
>>> If not, then your results at least confirms that w/o the property
>>> present, the changes won't break anything :).  Thanks again for the
>>> initial testing!

I applied the patch now to 5.13.0-rc5 + the following:

--- a/drivers/usb/dwc3/dwc3-pci.c
+++ b/drivers/usb/dwc3/dwc3-pci.c
@@ -124,6 +124,7 @@ static const struct property_entry 
dwc3_pci_mrfld_properties[] = {
      PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
      PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
      PROPERTY_ENTRY_BOOL("snps,usb2-gadget-lpm-disable"),
+    PROPERTY_ENTRY_BOOL("tx-fifo-resize"),
      PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
      {}
  };

  and when switching to gadget mode unfortunately received the following 
oops:

BUG: unable to handle page fault for address: 00000000202043f2
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 617 Comm: conf-gadget.sh Not tainted 
5.13.0-rc5-edison-acpi-standard #1
Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 
2015.01.21:18.19.48
RIP: 0010:dwc3_gadget_check_config+0x33/0x80
Code: 59 04 00 00 04 74 61 48 c1 ee 10 48 89 f7 f3 48 0f b8 c7 48 89 c7 
39 81 60 04 00 00 7d 4a 89 81 60 04 00 00 8b 81 08 04 00 00 <81> b8 e8 
03 00 00 32 33 00 00 0f b6 b0 09 04 00 00 75 0d 8b 80 20
RSP: 0018:ffffb5550038fda0 EFLAGS: 00010297
RAX: 000000002020400a RBX: ffffa04502627348 RCX: ffffa04507354028
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000004
RBP: ffffa04508ac0550 R08: ffffa04503a75b2c R09: 0000000000000000
R10: 0000000000000216 R11: 000000000002eba0 R12: ffffa04508ac0550
R13: dead000000000100 R14: ffffa04508ac0600 R15: ffffa04508ac0520
FS:  00007f7471e2f740(0000) GS:ffffa0453e200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000202043f2 CR3: 0000000003f38000 CR4: 00000000001006f0
Call Trace:
  configfs_composite_bind+0x2f4/0x430 [libcomposite]
  udc_bind_to_driver+0x64/0x180
  usb_gadget_probe_driver+0x114/0x150
  gadget_dev_desc_UDC_store+0xbc/0x130 [libcomposite]
  configfs_write_file+0xcd/0x140
  vfs_write+0xbb/0x250
  ksys_write+0x5a/0xd0
  do_syscall_64+0x40/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7471f1ff53
Code: 8b 15 21 cf 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 
00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
RSP: 002b:00007fffa3dcd328 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7471f1ff53
RDX: 000000000000000c RSI: 00005614d615a770 RDI: 0000000000000001
RBP: 00005614d615a770 R08: 000000000000000a R09: 00007f7471fb20c0
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007f7471fee520 R14: 000000000000000c R15: 00007f7471fee720
Modules linked in: usb_f_uac2 u_audio usb_f_mass_storage usb_f_eem 
u_ether usb_f_serial u_serial libcomposite rfcomm iptable_nat bnep 
snd_sof_nocodec spi_pxa2xx_platform dw_dmac smsc snd_sof_pci_intel_tng 
snd_sof_pci snd_sof_acpi_intel_byt snd_sof_intel_ipc snd_sof_acpi 
smsc95xx snd_sof pwm_lpss_pci pwm_lpss snd_sof_xtensa_dsp 
snd_intel_dspcfg snd_soc_acpi_intel_match snd_soc_acpi dw_dmac_pci 
intel_mrfld_pwrbtn intel_mrfld_adc dw_dmac_core spi_pxa2xx_pci brcmfmac 
brcmutil leds_gpio hci_uart btbcm ti_ads7950 
industrialio_triggered_buffer kfifo_buf ledtrig_timer ledtrig_heartbeat 
mmc_block extcon_intel_mrfld sdhci_pci cqhci sdhci led_class 
intel_soc_pmic_mrfld mmc_core btrfs libcrc32c xor zstd_compress 
zlib_deflate raid6_pq
CR2: 00000000202043f2
---[ end trace 5c11fe50dca92ad4 ]---

>> No I didn't. Afaik we don't have a devicetree property to set.
>>
>> But I'd be happy to test that as well. But where to set the property?
>>
>> dwc3_pci_mrfld_properties[] in dwc3-pci?
>>
> Hi Ferry,
>
> Not too sure which DWC3 driver is used for the Intel platform, but I
> believe that should be the one. (if that's what is normally used)  We'd
> just need to add an entry w/ the below:
>
> PROPERTY_ENTRY_BOOL("tx-fifo-resize")
>
> Thanks
> Wesley Cheng
>
>>> Thanks
>>> Wesley Cheng
>>>
>>>>>>>> Now with endpoint bursting, if the function notifies the host that
>>>>>>>> bursting is supported, when the host sends the ACK for the Data
>>>>>>>> Packet,
>>>>>>>> it should have a NumP value equal to the bMaxBurst reported in
>>>>>>>> the EP
>>>>>>> Yes and no. Looking back at the history, we used to configure NUMP
>>>>>>> based
>>>>>>> on bMaxBurst, but it was changed later in commit
>>>>>>> 4e99472bc10bda9906526d725ff6d5f27b4ddca1 by yours truly because of a
>>>>>>> problem reported by John Youn.
>>>>>>>
>>>>>>> And now we've come full circle. Because even if I believe more
>>>>>>> requests
>>>>>>> are enough for bursting, NUMP is limited by the RxFIFO size. This
>>>>>>> ends
>>>>>>> up supporting your claim that we need RxFIFO resizing if we want to
>>>>>>> squeeze more throughput out of the controller.
>>>>>>>
>>>>>>> However, note that this is about RxFIFO size, not TxFIFO size. In
>>>>>>> fact,
>>>>>>> looking at Table 8-13 of USB 3.1 r1.0, we read the following about
>>>>>>> NumP
>>>>>>> (emphasis is mine):
>>>>>>>
>>>>>>>          "Number of Packets (NumP). This field is used to indicate the
>>>>>>>          number of Data Packet buffers that the **receiver** can
>>>>>>>          accept. The value in this field shall be less than or
>>>>>>> equal to
>>>>>>>          the maximum burst size supported by the endpoint as
>>>>>>> determined
>>>>>>>          by the value in the bMaxBurst field in the Endpoint Companion
>>>>>>>          Descriptor (refer to Section 9.6.7)."
>>>>>>>
>>>>>>> So, NumP is for the receiver, not the transmitter. Could you clarify
>>>>>>> what you mean here?
>>>>>>>
>>>>>>> /me keeps reading
>>>>>>>
>>>>>>> Hmm, table 8-15 tries to clarify:
>>>>>>>
>>>>>>>          "Number of Packets (NumP).
>>>>>>>
>>>>>>>          For an OUT endpoint, refer to Table 8-13 for the
>>>>>>> description of
>>>>>>>          this field.
>>>>>>>
>>>>>>>          For an IN endpoint this field is set by the endpoint to the
>>>>>>>          number of packets it can transmit when the host resumes
>>>>>>>          transactions to it. This field shall not have a value greater
>>>>>>>          than the maximum burst size supported by the endpoint as
>>>>>>>          indicated by the value in the bMaxBurst field in the Endpoint
>>>>>>>          Companion Descriptor. Note that the value reported in this
>>>>>>> field
>>>>>>>          may be treated by the host as informative only."
>>>>>>>
>>>>>>> However, if I remember correctly (please verify dwc3 databook),
>>>>>>> NUMP in
>>>>>>> DCFG was only for receive buffers. Thin, John, how does dwc3 compute
>>>>>>> NumP for TX/IN endpoints? Is that computed as a function of
>>>>>>> DCFG.NUMP or
>>>>>>> TxFIFO size?
>>>>>>>
>>>>>>>> desc.  If we have a TXFIFO size of 2, then normally what I have
>>>>>>>> seen is
>>>>>>>> that after 2 data packets, the device issues a NRDY.  So then we'd
>>>>>>>> need
>>>>>>>> to send an ERDY once data is available within the FIFO, and the same
>>>>>>>> sequence happens until the USB request is complete.  With this
>>>>>>>> constant
>>>>>>>> NRDY/ERDY handshake going on, you actually see that the bus is under
>>>>>>>> utilized.  When we increase an EP's FIFO size, then you'll see
>>>>>>>> constant
>>>>>>>> bursts for a request, until the request is done, or if the host
>>>>>>>> runs out
>>>>>>>> of RXFIFO. (ie no interruption [on the USB protocol level] during
>>>>>>>> USB
>>>>>>>> request data transfer)
>>>>>>> Unfortunately I don't have access to a USB sniffer anymore :-(
>>>>>>>
>>>>>>>>>>>>> Good points.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Wesley, what kind of testing have you done on this on
>>>>>>>>>>>>> different devices?
>>>>>>>>>>>>>
>>>>>>>>>>>> As mentioned above, these changes are currently present on end
>>>>>>>>>>>> user
>>>>>>>>>>>> devices for the past few years, so its been through a lot of
>>>>>>>>>>>> testing :).
>>>>>>>>>>> all with the same gadget driver. Also, who uses USB on android
>>>>>>>>>>> devices
>>>>>>>>>>> these days? Most of the data transfer goes via WiFi or
>>>>>>>>>>> Bluetooth, anyway
>>>>>>>>>>> :-)
>>>>>>>>>>>
>>>>>>>>>>> I guess only developers are using USB during development to
>>>>>>>>>>> flash dev
>>>>>>>>>>> images heh.
>>>>>>>>>>>
>>>>>>>>>> I used to be a customer facing engineer, so honestly I did see
>>>>>>>>>> some
>>>>>>>>>> really interesting and crazy designs.  Again, we do have
>>>>>>>>>> non-Android
>>>>>>>>>> products that use the same code, and it has been working in there
>>>>>>>>>> for a
>>>>>>>>>> few years as well.  The TXFIFO sizing really has helped with
>>>>>>>>>> multimedia
>>>>>>>>>> use cases, which use isoc endpoints, since esp. in those lower
>>>>>>>>>> end CPU
>>>>>>>>>> chips where latencies across the system are much larger, and a
>>>>>>>>>> missed
>>>>>>>>>> ISOC interval leads to a pop in your ear.
>>>>>>>>> This is good background information. Thanks for bringing this
>>>>>>>>> up. Admitedly, we still have ISOC issues with dwc3. I'm
>>>>>>>>> interested in
>>>>>>>>> knowing if a deeper request queue would also help here.
>>>>>>>>>
>>>>>>>>> Remember dwc3 can accomodate 255 requests + link for each
>>>>>>>>> endpoint. If
>>>>>>>>> our gadget driver uses a low number of requests, we're never really
>>>>>>>>> using the TRB ring in our benefit.
>>>>>>>>>
>>>>>>>> We're actually using both a deeper USB request queue + TX fifo
>>>>>>>> resizing. :).
>>>>>>> okay, great. Let's see what John and/or Thinh respond WRT dwc3 TX
>>>>>>> Burst
>>>>>>> behavior.
>>>>>> -- 
>>>>>> heikki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ