lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f7b11ed-e387-4d51-b0f4-3af8d4e32a16@gmail.com>
Date: Wed, 15 May 2024 20:38:30 +0200
From: Ferry Toth <fntoth@...il.com>
To: Hardik Gajjar <hgajjar@...adit-jv.com>
Cc: Andy Shevchenko <andriy.shevchenko@...el.com>,
 gregkh@...uxfoundation.org, s.hauer@...gutronix.de, jonathanh@...dia.com,
 linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
 quic_linyyuan@...cinc.com, paul@...pouillou.net, quic_eserrao@...cinc.com,
 erosca@...adit-jv.com, regressions@...mhuis.info
Subject: Re: [PATCH v4] usb: gadget: u_ether: Replace netif_stop_queue with
 netif_device_detach

Hi,

Sorry to have kept you waiting, I was away for a short break.

Patch tested below.

Op 10-05-2024 om 11:45 schreef Hardik Gajjar:
> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote:
>> Oops, sorry, wrong file attached . Now correct one.
>>
>> Op 02-05-2024 om 22:13 schreef Ferry Toth:
>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar:
>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote:
>>>>> Hi,
>>>>>
>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth:
>>>>>> Hi,
>>>>>>
>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar:
>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar:
>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote:
>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote:
>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko:
>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote:
>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at
>>>>>>>>>>>>>> 08:37:42PM +0300, Andy
>>>>>>>>>>>>>> Shevchenko wrote:
>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote:
>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar:
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not
>>>>>>>>>>>>>>>> removed and it is a link, the link
>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0
>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>> longer exists
>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is
>>>>>>>>>>>>>> present, but the symlink is
>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should
>>>>>>>>>>>>>> recreate the device, and the
>>>>>>>>>>>>>> symlink should become active again
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated):
>>>>>>>>>>>>
>>>>>>>>>>>> root@...a:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> driver  net  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>
>>>>>>>>>>>> Then switching to host mode:
>>>>>>>>>>>>
>>>>>>>>>>>> root@...a:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> ls: cannot access
>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/':
>>>>>>>>>>>> No such file
>>>>>>>>>>>> or directory
>>>>>>>>>>>>
>>>>>>>>>>>> Then back to device mode:
>>>>>>>>>>>>
>>>>>>>>>>>> root@...a:~# ls
>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/
>>>>>>>>>>>> driver  power  sound  subsystem  suspended  uevent
>>>>>>>>>>>>
>>>>>>>>>>>> net is missing. But, network functions:
>>>>>>>>>>>>
>>>>>>>>>>>> root@...a:~# ping 10.42.0.1
>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes
>>>>>>>>>>>>
>>>>>>>>>>>> Mass storage device is created and removed each time as expected.
>>>>>>>>>>>
>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those
>>>>>>>>>>> two changes?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget.
>>>>>>>>>> I could not reproduce
>>>>>>>>>> the issue.
>>>>>>>>>>
>>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of
>>>>>>>>>> the role switching of the USB mode (peripheral <-> host)
>>>>>>>>>>
>>>>>>>>>> Following are the logs:
>>>>>>>>>> //Host to device
>>>>>>>>>>
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral"
>>>>>>>>>>> mode
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>> usb0
>>>>>>>>>>
>>>>>>>>>> //device to host
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb
>>>>>>>>>> # echo "host" > mode
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls
>>>>>>>>>> a800000.dwc3/gadget/net/
>>>>>>>>>> usb0
>>>>>>>>>
>>>>>>>>> That is weird. When I switch to host mode (using
>>>>>>>>> the physical switch),
>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5)
>>>>>>>>>
>>>>>>>>> Switching back to device mode, that gadget
>>>>>>>>> directory is recreated. And
>>>>>>>>> gadget/sound as well, but not gadget/net.
>>>>>>>>>
>>>>>>>>>> s a800000.dwc3/gadget/net/usb0
>>>>>>>>>> <
>>>>>>>>>> addr_assign_type    duplex             phys_port_name
>>>>>>>>>> addr_len            flags              phys_switch_id
>>>>>>>>>> address             gro_flush_timeout  power
>>>>>>>>>> broadcast           ifalias            proto_down
>>>>>>>>>> carrier             ifindex            queues
>>>>>>>>>> carrier_changes     iflink             speed
>>>>>>>>>> carrier_down_count  link_mode          statistics
>>>>>>>>>> carrier_up_count    mtu                subsystem
>>>>>>>>>> dev_id              name_assign_type   tx_queue_len
>>>>>>>>>> dev_port            netdev_group       type
>>>>>>>>>> device              operstate          uevent
>>>>>>>>>> dormant             phys_port_id       waiting_for_supplier
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0
>>>>>>>>>> usb0      Link encap:Ethernet  HWaddr 3a:8b:63:97:1a:9a
>>>>>>>>>>               BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>>>>>               RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>>>>>               TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>>>>>               collisions:0 txqueuelen:1000
>>>>>>>>>>               RX bytes:0 TX bytes:0
>>>>>>>>>>
>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb #
>>>>>>>>>>
>>>>>>>>>> I strongly advise against reverting the patch solely based on the
>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while
>>>>>>>>>> the usb0 interface remains available.
>>>>>>>>>
>>>>>>>>> There's more to it. I also mentioned that switching the role or
>>>>>>>>> unplugging the cable leaves the usb0 connection.
>>>>>>>>>
>>>>>>>>> I have while in host mode:
>>>>>>>>> root@...a:~# ifconfig -a usb0
>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC>  mtu 1500
>>>>>>>>>             inet 10.42.0.221  netmask 255.255.255.0  broadcast
>>>>>>>>> 10.42.0.255
>>>>>>>>>             inet6 fe80::a8bb:ccff:fedd:eef1  prefixlen 64
>>>>>>>>> scopeid 0x20<link>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You don't see that because you didn't create a connection at all.
>>>>>>>>>
>>>>>>>>>> Instead, I recommend enabling FTRACE to
>>>>>>>>>> trace the functions involved
>>>>>>>>>> and identify which faulty call is responsible for removing usb0.
>>>>>>>>>
>>>>>>>>> Switching from device -> host -> device:
>>>>>>>>>
>>>>>>>>> root@...a:~# trace-cmd record -p function_graph -l *gether_*
>>>>>>>>>       plugin 'function_graph'
>>>>>>>>> Hit Ctrl^C to stop recording
>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000
>>>>>>>>>         188 bytes in size (4096 uncompressed)
>>>>>>>>> CPU1 data recorded at offset=0x1c9000
>>>>>>>>>         0 bytes in size (0 uncompressed)
>>>>>>>>> root@...a:~# trace-cmd report
>>>>>>>>> cpus=2
>>>>>>>>>          irq/68-dwc3-725   [000]   514.575337:
>>>>>>>>> funcgraph_entry:      #
>>>>>>>>> 2079.480 us |  gether_disconnect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.263731:
>>>>>>>>> funcgraph_entry:      +
>>>>>>>>> 11.640 us  |  gether_disconnect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.263743:
>>>>>>>>> funcgraph_entry:      !
>>>>>>>>> 116.520 us |  gether_connect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.268029:
>>>>>>>>> funcgraph_entry:      #
>>>>>>>>> 2057.260 us |  gether_disconnect();
>>>>>>>>>          irq/68-dwc3-946   [000]   524.270089:
>>>>>>>>> funcgraph_entry:      !
>>>>>>>>> 109.000 us |  gether_connect();
>>>>>>>>
>>>>>>>> I tried to get a more useful trace:
>>>>>>>> root@...a:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter
>>>>>>>> root@...a:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter
>>>>>>>> root@...a:/sys/kernel/tracing# echo function > current_tracer
>>>>>>>> root@...a:/sys/kernel/tracing# echo 'reset_config'
>>>>>>>>>> set_ftrace_filter
>>>>>>>> -> switch to host mode then back to device
>>>>>>>> root@...a:/sys/kernel/tracing# cat trace
>>>>>>>> # tracer: function
>>>>>>>> #
>>>>>>>> # entries-in-buffer/entries-written: 53/53   #P:2
>>>>>>>> #
>>>>>>>> #                                _-----=> irqs-off/BH-disabled
>>>>>>>> #                               / _----=> need-resched
>>>>>>>> #                              | / _---=> hardirq/softirq
>>>>>>>> #                              || / _--=> preempt-depth
>>>>>>>> #                              ||| / _-=> migrate-disable
>>>>>>>> #                              |||| /     delay
>>>>>>>> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
>>>>>>>> #              | |         |   |||||     |         |
>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.990254: reset_config
>>>>>>>> <-__composite_disconnect
>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992274: eem_disable
>>>>>>>> <-reset_config
>>>>>>>>         irq/68-dwc3-523     [000] D..3.   133.992276:
>>>>>>>> gether_disconnect
>>>>>>>> <-reset_config
>>>>>>>>         kworker/1:3-443     [001] ...1.   134.022453: eem_unbind
>>>>>>>> <-purge_configs_funcs
>>>>>>>>
>>>>>>>> -> to device mode
>>>>>>>>
>>>>>>>>         kworker/1:3-443     [001] ...1.   148.630773: eem_bind
>>>>>>>> <-usb_add_function
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155209: eem_set_alt
>>>>>>>> <-composite_setup
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155215:
>>>>>>>> gether_disconnect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.155220: gether_connect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157287: eem_set_alt
>>>>>>>> <-composite_setup
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.157292:
>>>>>>>> gether_disconnect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..3.   149.159338: gether_connect
>>>>>>>> <-eem_set_alt
>>>>>>>>         irq/68-dwc3-734     [000] D..2.   149.239625: eem_unwrap
>>>>>>>> <-rx_complete
>>>>>>>> ...
>>>>>>>>
>>>>>>>> I don't know where to look exactly. Any hints?
>>>>>>>
>>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ?
>>>>>>
>>>>>> Nope. It's a pitty that the trace formatting got messed up
>>>>>> above. But as
>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced
>>>>>> function is called, until I flip the switch to device mode.
>>>>>> The ... at the end is where I cut uninteresting eem_unwrap
>>>>>> <-rx_complete
>>>>>> and eem_wrap <-eth_start_xmit lines.
>>>>>>
>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and
>>>>>>> network side to check who deleting the sysfs entry
>>>>>>
>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how
>>>>>> that would be related to the patch we're discussing here. I was hoping
>>>>>> for a little more targeted hint.
>>>>>
>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*',
>>>>> 'usb_fun*',
>>>>> 'reset_config' and 'device_remove_file' leads me to:
>>>>>
>>>>> TIMESTAMP  FUNCTION
>>>>>      |         |
>>>>>     49.952477: eem_wrap <-eth_start_xmit
>>>>>     55.072455: eem_wrap <-eth_start_xmit
>>>>>     55.072621: eem_unwrap <-rx_complete
>>>>>     59.011540: configfs_composite_reset <-usb_gadget_udc_reset
>>>>>     59.011545: composite_reset <-configfs_composite_reset
>>>>>     59.011548: reset_config <-__composite_disconnect
>>>>>     59.013565: eem_disable <-reset_config
>>>>>     59.013567: gether_disconnect <-reset_config
>>>>>     59.049560: device_remove_file <-device_remove
>>>>>     59.051185: configfs_composite_disconnect <-usb_gadget_disco
>>>>>     59.051189: composite_disconnect <-configfs_composite_discon
>>>>>     59.051195: configfs_composite_unbind <-gadget_unbind_driver
>>>>>     59.052519: eem_unbind <-purge_configs_funcs
>>>>>     59.052529: composite_dev_cleanup <-configfs_composite_unbin
>>>>>     59.052537: device_remove_file <-composite_dev_cleanup
>>>>>
>>>>> device_remove_file gets called twice, once by device_remove after
>>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup
>>>>> (removing the gadget)
>>>>
>>>> I believe that the device_remove_file function is only removing
>>>> suspend-specific attributes, not the complete gadget.
>>>> Typically, when you perform the role switch, the Gadget start/stop
>>>> function in your UDC driver is called. These functions should not
>>>> delete the gadget
>>>>
>>>> To investigate further, could you please enable the DWC3 functions
>>>> in ftrace and check who is removing the gadget?
>>>> I can also enable this on my system and compare the logs with yours,
>>>> but I will be in PI planning for 1.5 weeks and may not be able to
>>>> provide immediate support.
>>>
>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so
>>> trace is 600kb. I cut uninteresting stuff before
>>> configfs_composite_reset <-usb_gadget_udc_reset and after
>>> __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=>
>>> compare.
>>>
> Could you please try with the following patch and see if your issue resolves ?
> 
> diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c
> index 3b445bd88498..c2a904475083 100644
> --- a/drivers/usb/gadget/function/f_eem.c
> +++ b/drivers/usb/gadget/function/f_eem.c
> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
>   	struct usb_composite_dev *cdev = c->cdev;
>   	struct f_eem		*eem = func_to_eem(f);
>   	struct usb_string	*us;
> -	int			status;
> +	int			status = 0;
>   	struct usb_ep		*ep;
>   
>   	struct f_eem_opts	*eem_opts;
> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f)
>   	 * with list_for_each_entry, so we assume no race condition
>   	 * with regard to eem_opts->bound access
>   	 */
> +	mutex_lock(&eem_opts->lock);
> +	gether_set_gadget(eem_opts->net, cdev->gadget);
> +
>   	if (!eem_opts->bound) {
> -		mutex_lock(&eem_opts->lock);
> -		gether_set_gadget(eem_opts->net, cdev->gadget);
>   		status = gether_register_netdev(eem_opts->net);
> -		mutex_unlock(&eem_opts->lock);
> -		if (status)
> -			return status;
> -		eem_opts->bound = true;
>   	}
>   
> +	mutex_unlock(&eem_opts->lock);
> +
> +	if (status)
> +		goto fail;
> +
> +	eem_opts->bound = true;
> +
>   	us = usb_gstrings_attach(cdev, eem_strings,
>   				 ARRAY_SIZE(eem_string_defs));
> 

After switching from host -> device -> host the usb0 device as seen by 
ifconfig remains "RUNNING" and in the routing table.

FYI This is the regression.

Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is 
still missing.

After deeper diving into v6.1.0, I found the latter a longer existing 
bug, not caused by your patch.

More over, this doesn't seem to do any harm.

The first issue does, as we try to route traffic over a device that no 
longer exists.

>>>> Additionally, please check if you have any customized DWC patches
>>>> that may be causing this problem.
>>>>
>>>>>
>>>>>> You may recall the whole issue did not occur before this
>>>>>> patch got applied.
>>>>>>
>>>>>>> Hardik
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> According to current kernel architecture of u_ether driver, only
>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its
>>>>>>>>>> kobject and sysfs interface.
>>>>>>>>>> I suggest sharing the analysis here to understand why this practice
>>>>>>>>>> is not followed in your use case or driver ?
>>>>>>>>>
>>>>>>>>> Yes, I'll try to trace where that happens.
>>>>>>>>>
>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems
>>>>>>>>> harmless? But the usb: net device remaining after disconnect or role
>>>>>>>>> switch is not good, as the route remains.
>>>>>>>>>
>>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what
>>>>>>>>> happens if you make eem connection and then unplug?
>>>>>>>>>
>>>>>>>>>> I am curious why the driver was developed without adhering to the
>>>>>>>>>> kernel's gadget architecture.
>>>>>>>>
>>>>>>>> I don't know what you mean here. Which driver do you mean?
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check
>>>>>>>>>>>>>> with this patch and
>>>>>>>>>>>>>> share result here.  May be we need some fix in dwc3
>>>>>>>>>>>> Would have been nice if someone could test on other
>>>>>>>>>>>> controller as well. But
>>>>>>>>>>>> another instance of dwc3 is also very welcome.
>>>>>>>>>>>>> It's quite possible, please test on your side.
>>>>>>>>>>>>> We are happy to test any fixes if you come up with.
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> With Best Regards,
>>>>>>>>>>> Andy Shevchenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ