[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64ffb6e4-d0c9-4e53-a314-a174097516b1@gmail.com>
Date: Thu, 2 May 2024 22:13:43 +0200
From: Ferry Toth <fntoth@...il.com>
To: Hardik Gajjar <hgajjar@...adit-jv.com>
Cc: Andy Shevchenko <andriy.shevchenko@...el.com>,
gregkh@...uxfoundation.org, s.hauer@...gutronix.de, jonathanh@...dia.com,
linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
quic_linyyuan@...cinc.com, paul@...pouillou.net, quic_eserrao@...cinc.com,
erosca@...adit-jv.com
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with
netif_device_detach
Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>> Hi,
>>
>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>> Hi,
>>>
>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>> no
>>>>>>>>>>>>> longer exists
>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>> symlink should become active again
>>>>>>>>>
>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>
>>>>>>>>> root@...a:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver net power sound subsystem suspended uevent
>>>>>>>>>
>>>>>>>>> Then switching to host mode:
>>>>>>>>>
>>>>>>>>> root@...a:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> ls: cannot access
>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>> No such file
>>>>>>>>> or directory
>>>>>>>>>
>>>>>>>>> Then back to device mode:
>>>>>>>>>
>>>>>>>>> root@...a:~# ls
>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>> driver power sound subsystem suspended uevent
>>>>>>>>>
>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>
>>>>>>>>> root@...a:~# ping 10.42.0.1
>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>
>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>
>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>> two changes?
>>>>>>>
>>>>>>>
>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>> I could not reproduce
>>>>>>> the issue.
>>>>>>>
>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>
>>>>>>> Following are the logs:
>>>>>>> //Host to device
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>> mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>>
>>>>>>> //device to host
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>> a800000.dwc3/gadget/net/
>>>>>>> usb0
>>>>>>
>>>>>> That is weird. When I switch to host mode (using the physical switch),
>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>
>>>>>> Switching back to device mode, that gadget directory is recreated. And
>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>
>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>> <
>>>>>>> addr_assign_type duplex phys_port_name
>>>>>>> addr_len flags phys_switch_id
>>>>>>> address gro_flush_timeout power
>>>>>>> broadcast ifalias proto_down
>>>>>>> carrier ifindex queues
>>>>>>> carrier_changes iflink speed
>>>>>>> carrier_down_count link_mode statistics
>>>>>>> carrier_up_count mtu subsystem
>>>>>>> dev_id name_assign_type tx_queue_len
>>>>>>> dev_port netdev_group type
>>>>>>> device operstate uevent
>>>>>>> dormant phys_port_id waiting_for_supplier
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a
>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1
>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>> collisions:0 txqueuelen:1000
>>>>>>> RX bytes:0 TX bytes:0
>>>>>>>
>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>
>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>> the usb0 interface remains available.
>>>>>>
>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>
>>>>>> I have while in host mode:
>>>>>> root@...a:~# ifconfig -a usb0
>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500
>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast
>>>>>> 10.42.0.255
>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64
>>>>>> scopeid 0x20<link>
>>>>>>
>>>>>>
>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>
>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved
>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>
>>>>>> Switching from device -> host -> device:
>>>>>>
>>>>>> root@...a:~# trace-cmd record -p function_graph -l *gether_*
>>>>>> plugin 'function_graph'
>>>>>> Hit Ctrl^C to stop recording
>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>> 188 bytes in size (4096 uncompressed)
>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>> 0 bytes in size (0 uncompressed)
>>>>>> root@...a:~# trace-cmd report
>>>>>> cpus=2
>>>>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: #
>>>>>> 2079.480 us | gether_disconnect();
>>>>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: +
>>>>>> 11.640 us | gether_disconnect();
>>>>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: !
>>>>>> 116.520 us | gether_connect();
>>>>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: #
>>>>>> 2057.260 us | gether_disconnect();
>>>>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: !
>>>>>> 109.000 us | gether_connect();
>>>>>
>>>>> I tried to get a more useful trace:
>>>>> root@...a:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>> root@...a:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>> root@...a:/sys/kernel/tracing# echo function > current_tracer
>>>>> root@...a:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter
>>>>> -> switch to host mode then back to device
>>>>> root@...a:/sys/kernel/tracing# cat trace
>>>>> # tracer: function
>>>>> #
>>>>> # entries-in-buffer/entries-written: 53/53 #P:2
>>>>> #
>>>>> # _-----=> irqs-off/BH-disabled
>>>>> # / _----=> need-resched
>>>>> # | / _---=> hardirq/softirq
>>>>> # || / _--=> preempt-depth
>>>>> # ||| / _-=> migrate-disable
>>>>> # |||| / delay
>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
>>>>> # | | | ||||| | |
>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config
>>>>> <-__composite_disconnect
>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable
>>>>> <-reset_config
>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect
>>>>> <-reset_config
>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind
>>>>> <-purge_configs_funcs
>>>>>
>>>>> -> to device mode
>>>>>
>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind
>>>>> <-usb_add_function
>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt
>>>>> <-composite_setup
>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect
>>>>> <-eem_set_alt
>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect
>>>>> <-eem_set_alt
>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt
>>>>> <-composite_setup
>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect
>>>>> <-eem_set_alt
>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect
>>>>> <-eem_set_alt
>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap
>>>>> <-rx_complete
>>>>> ...
>>>>>
>>>>> I don't know where to look exactly. Any hints?
>>>>
>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>
>>> Nope. It's a pitty that the trace formatting got messed up above. But as
>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>> function is called, until I flip the switch to device mode.
>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete
>>> and eem_wrap <-eth_start_xmit lines.
>>>
>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>> network side to check who deleting the sysfs entry
>>>
>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>> that would be related to the patch we're discussing here. I was hoping
>>> for a little more targeted hint.
>>
>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*',
>> 'reset_config' and 'device_remove_file' leads me to:
>>
>> TIMESTAMP FUNCTION
>> | |
>> 49.952477: eem_wrap <-eth_start_xmit
>> 55.072455: eem_wrap <-eth_start_xmit
>> 55.072621: eem_unwrap <-rx_complete
>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>> 59.011545: composite_reset <-configfs_composite_reset
>> 59.011548: reset_config <-__composite_disconnect
>> 59.013565: eem_disable <-reset_config
>> 59.013567: gether_disconnect <-reset_config
>> 59.049560: device_remove_file <-device_remove
>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco
>> 59.051189: composite_disconnect <-configfs_composite_discon
>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver
>> 59.052519: eem_unbind <-purge_configs_funcs
>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin
>> 59.052537: device_remove_file <-composite_dev_cleanup
>>
>> device_remove_file gets called twice, once by device_remove after
>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>> (removing the gadget)
>
> I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget.
> Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget
>
> To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget?
> I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support.
Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
trace is 600kb. I cut uninteresting stuff before
configfs_composite_reset <-usb_gadget_udc_reset and after
__dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to compare.
> Additionally, please check if you have any customized DWC patches that may be causing this problem.
>
>>
>>> You may recall the whole issue did not occur before this patch got applied.
>>>
>>>> Hardik
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>> kobject and sysfs interface.
>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>> is not followed in your use case or driver ?
>>>>>>
>>>>>> Yes, I'll try to trace where that happens.
>>>>>>
>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>> switch is not good, as the route remains.
>>>>>>
>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>> happens if you make eem connection and then unplug?
>>>>>>
>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>> kernel's gadget architecture.
>>>>>
>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>
>>>>>>>>
>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>> with this patch and
>>>>>>>>>>> share result here. May be we need some fix in dwc3
>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>> controller as well. But
>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>
>>>>>>>> --
>>>>>>>> With Best Regards,
>>>>>>>> Andy Shevchenko
>>>>>>>>
>>>>>>>>
>>>>>>
>>>
Download attachment "trace-u_ether.tar.gz" of type "application/gzip" (1484 bytes)
Powered by blists - more mailing lists