lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7013cad9-7331-45be-b115-e9e1f253bcea@gmail.com>
Date: Thu, 2 May 2024 22:32:16 +0200
From: Ferry Toth <fntoth@...il.com>
To: Hardik Gajjar <hgajjar@...adit-jv.com>
Cc: Andy Shevchenko <andriy.shevchenko@...el.com>,
 gregkh@...uxfoundation.org, s.hauer@...gutronix.de, jonathanh@...dia.com,
 linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
 quic_linyyuan@...cinc.com, paul@...pouillou.net, quic_eserrao@...cinc.com,
 erosca@...adit-jv.com
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with
 netif_device_detach

Oops, sorry, wrong file attached . Now correct one.

Op 02-05-2024 om 22:13 schreef Ferry Toth:
> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>> Hi,
>>>
>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>> Hi,
>>>>
>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>> target 
>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>
>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>
>>>>>>>>>> root@...a:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>
>>>>>>>>>> root@...a:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> ls: cannot access
>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>> No such file
>>>>>>>>>> or directory
>>>>>>>>>>
>>>>>>>>>> Then back to device mode:
>>>>>>>>>>
>>>>>>>>>> root@...a:~# ls
>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>
>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>
>>>>>>>>>> root@...a:~# ping 10.42.0.1
>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>
>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>
>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>> two changes?
>>>>>>>>
>>>>>>>>
>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>> I could not reproduce
>>>>>>>> the issue.
>>>>>>>>
>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>
>>>>>>>> Following are the logs:
>>>>>>>> //Host to device
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>>
>>>>>>>> //device to host
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > 
>>>>>>>> mode
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>> usb0
>>>>>>>
>>>>>>> That is weird. When I switch to host mode (using the physical 
>>>>>>> switch),
>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>
>>>>>>> Switching back to device mode, that gadget directory is 
>>>>>>> recreated. And
>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>
>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>> <
>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>> address             gro_flush_timeout  power
>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>> carrier             ifindex            queues
>>>>>>>> carrier_changes     iflink             speed
>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>> dev_port            netdev_group       type
>>>>>>>> device              operstate          uevent
>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>              BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>              collisions:0 txqueuelen:1000
>>>>>>>>              RX bytes:0 TX bytes:0
>>>>>>>>
>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>
>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>> the usb0 interface remains available.
>>>>>>>
>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>
>>>>>>> I have while in host mode:
>>>>>>> root@...a:~# ifconfig -a usb0
>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>            inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>> 10.42.0.255
>>>>>>>            inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>> scopeid 0x20<link>
>>>>>>>
>>>>>>>
>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>
>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions 
>>>>>>>> involved
>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>
>>>>>>> Switching from device -> host -> device:
>>>>>>>
>>>>>>> root@...a:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>      plugin 'function_graph'
>>>>>>> Hit Ctrl^C to stop recording
>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>        188 bytes in size (4096 uncompressed)
>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>        0 bytes in size (0 uncompressed)
>>>>>>> root@...a:~# trace-cmd report
>>>>>>> cpus=2
>>>>>>>         irq/68-dwc3-725   [000]   514.575337: 
>>>>>>> funcgraph_entry:      #
>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263731: 
>>>>>>> funcgraph_entry:      +
>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.263743: 
>>>>>>> funcgraph_entry:      !
>>>>>>> 116.520 us |  gether_connect();
>>>>>>>         irq/68-dwc3-946   [000]   524.268029: 
>>>>>>> funcgraph_entry:      #
>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>         irq/68-dwc3-946   [000]   524.270089: 
>>>>>>> funcgraph_entry:      !
>>>>>>> 109.000 us |  gether_connect();
>>>>>>
>>>>>> I tried to get a more useful trace:
>>>>>> root@...a:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>> root@...a:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>> root@...a:/sys/kernel/tracing# echo function > current_tracer
>>>>>> root@...a:/sys/kernel/tracing# echo 'reset_config' >> 
>>>>>> set_ftrace_filter
>>>>>> -> switch to host mode then back to device
>>>>>> root@...a:/sys/kernel/tracing# cat trace
>>>>>> # tracer: function
>>>>>> #
>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>> #
>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>> #                               / _----=> need-resched
>>>>>> #                              | / _---=> hardirq/softirq
>>>>>> #                              || / _--=> preempt-depth
>>>>>> #                              ||| / _-=> migrate-disable
>>>>>> #                              |||| /     delay
>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>> #              | |         |   |||||     |         |
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>> <-__composite_disconnect
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>> <-reset_config
>>>>>>        irq/68-dwc3-523     [000] D..3.   133.992276: 
>>>>>> gether_disconnect
>>>>>> <-reset_config
>>>>>>        kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>> <-purge_configs_funcs
>>>>>>
>>>>>> -> to device mode
>>>>>>
>>>>>>        kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>> <-usb_add_function
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155215: 
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>> <-composite_setup
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.157292: 
>>>>>> gether_disconnect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>> <-eem_set_alt
>>>>>>        irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>> <-rx_complete
>>>>>> ...
>>>>>>
>>>>>> I don't know where to look exactly. Any hints?
>>>>>
>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>
>>>> Nope. It's a pitty that the trace formatting got messed up above. 
>>>> But as
>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>> function is called, until I flip the switch to device mode.
>>>> The ... at the end is where I cut uninteresting eem_unwrap 
>>>> <-rx_complete
>>>> and eem_wrap <-eth_start_xmit lines.
>>>>
>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>> network side to check who deleting the sysfs entry
>>>>
>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>> that would be related to the patch we're discussing here. I was hoping
>>>> for a little more targeted hint.
>>>
>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 
>>> 'usb_fun*',
>>> 'reset_config' and 'device_remove_file' leads me to:
>>>
>>> TIMESTAMP  FUNCTION
>>>     |         |
>>>    49.952477: eem_wrap <-eth_start_xmit
>>>    55.072455: eem_wrap <-eth_start_xmit
>>>    55.072621: eem_unwrap <-rx_complete
>>>    59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>    59.011545: composite_reset <-configfs_composite_reset
>>>    59.011548: reset_config <-__composite_disconnect
>>>    59.013565: eem_disable <-reset_config
>>>    59.013567: gether_disconnect <-reset_config
>>>    59.049560: device_remove_file <-device_remove
>>>    59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>    59.051189: composite_disconnect <-configfs_composite_discon
>>>    59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>    59.052519: eem_unbind <-purge_configs_funcs
>>>    59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>    59.052537: device_remove_file <-composite_dev_cleanup
>>>
>>> device_remove_file gets called twice, once by device_remove after
>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>> (removing the gadget)
>>
>> I believe that the device_remove_file function is only removing 
>> suspend-specific attributes, not the complete gadget.
>> Typically, when you perform the role switch, the Gadget start/stop 
>> function in your UDC driver is called. These functions should not 
>> delete the gadget
>>
>> To investigate further, could you please enable the DWC3 functions in 
>> ftrace and check who is removing the gadget?
>> I can also enable this on my system and compare the logs with yours, 
>> but I will be in PI planning for 1.5 weeks and may not be able to 
>> provide immediate support.
> 
> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so 
> trace is 600kb. I cut uninteresting stuff before 
> configfs_composite_reset <-usb_gadget_udc_reset and after 
> __dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to 
> compare.
> 
>> Additionally, please check if you have any customized DWC patches that 
>> may be causing this problem.
>>
>>>
>>>> You may recall the whole issue did not occur before this patch got 
>>>> applied.
>>>>
>>>>> Hardik
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>> kobject and sysfs interface.
>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>> is not followed in your use case or driver ?
>>>>>>>
>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>
>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>> switch is not good, as the route remains.
>>>>>>>
>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>
>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>> kernel's gadget architecture.
>>>>>>
>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>
>>>>>>>>>
>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>> with this patch and
>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>> controller as well. But
>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> With Best Regards,
>>>>>>>>> Andy Shevchenko
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
Download attachment "trace_u_ether.tar.gz" of type "application/gzip" (2501 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ