[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d507c790-54c2-4708-a52b-e8c94b6f6875@linuxfoundation.org>
Date: Wed, 9 Jul 2025 15:33:03 -0600
From: Shuah Khan <skhan@...uxfoundation.org>
To: Zongmin Zhou <min_halo@....com>, Greg KH <gregkh@...uxfoundation.org>
Cc: shuah@...nel.org, valentina.manea.m@...il.com, i@...ithal.me,
linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
zhouzongmin@...inos.cn, Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: [PATCH v2] usbip: convert to use faux_device
On 7/9/25 03:07, Zongmin Zhou wrote:
>
> On 2025/7/9 02:16, Shuah Khan wrote:
>> On 7/3/25 00:04, Zongmin Zhou wrote:
>>>
>>> On 2025/7/3 07:54, Shuah Khan wrote:
>>>> On 7/1/25 20:12, Zongmin Zhou wrote:
>>>>>
>>>>> On 2025/7/2 06:56, Shuah Khan wrote:
>>>>>> On 6/23/25 21:21, Zongmin Zhou wrote:
>>>>>>>
>>>>>>> On 2025/6/21 01:26, Shuah Khan wrote:
>>>>>>>> On 6/20/25 03:27, Greg KH wrote:
>>>>>>>>> On Fri, Jun 20, 2025 at 05:19:33PM +0800, Zongmin Zhou wrote:
>>>>>>>>>>
>>>>>>>>>> On 2025/6/20 12:29, Greg KH wrote:
>>>>>>>>>>> On Fri, Jun 20, 2025 at 10:16:16AM +0800, Zongmin Zhou wrote:
>>>>>>>>>>>> On 2025/6/19 19:01, Greg KH wrote:
>>>>>>>>>>>>> On Wed, Jun 04, 2025 at 02:54:10PM +0800, Zongmin Zhou wrote:
>>>>>>>>>>>>>> From: Zongmin Zhou <zhouzongmin@...inos.cn>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The vhci driver does not need to create a platform device,
>>>>>>>>>>>>>> it only did so because it was simple to do that in order to
>>>>>>>>>>>>>> get a place in sysfs to hang some device-specific attributes.
>>>>>>>>>>>>>> Now the faux device interface is more appropriate,change it
>>>>>>>>>>>>>> over to use the faux bus instead.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Zongmin Zhou <zhouzongmin@...inos.cn>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> Changes in v2:
>>>>>>>>>>>>>> - don't change faux create api,just call probe on vhci_hcd_init.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> drivers/usb/usbip/vhci.h | 4 +-
>>>>>>>>>>>>>> drivers/usb/usbip/vhci_hcd.c | 86 +++++++++++-----------------
>>>>>>>>>>>>>> drivers/usb/usbip/vhci_sysfs.c | 68 +++++++++++-----------
>>>>>>>>>>>>>> tools/usb/usbip/libsrc/vhci_driver.h | 2 +-
>>>>>>>>>>>>>> 4 files changed, 72 insertions(+), 88 deletions(-)
>>>>>>>>>>>>> I get the following build errors from this patch:
>>>>>>>>>>>>>
>>>>>>>>>>>>> drivers/usb/usbip/vhci_hcd.c:1462:12: error: ‘vhci_hcd_resume’ defined but not used [-Werror=unused-function]
>>>>>>>>>>>>> 1462 | static int vhci_hcd_resume(struct faux_device *fdev)
>>>>>>>>>>>>> | ^~~~~~~~~~~~~~~
>>>>>>>>>>>>> drivers/usb/usbip/vhci_hcd.c:1418:12: error: ‘vhci_hcd_suspend’ defined but not used [-Werror=unused-function]
>>>>>>>>>>>>> 1418 | static int vhci_hcd_suspend(struct faux_device *fdev, pm_message_t state)
>>>>>>>>>>>>> | ^~~~~~~~~~~~~~~~
>>>>>>>>>>>>> cc1: all warnings being treated as errors
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you sure you tested this?
>>>>>>>>>>>> I apologize for not enabling -Werror, which resulted in missing this error
>>>>>>>>>>>> warning.
>>>>>>>>>>>> I have tested usbip feature use the new patch,but not test system
>>>>>>>>>>>> suspend/resume.
>>>>>>>>>>>> The faux bus type don't add pm function,and vhci-hcd driver can't register
>>>>>>>>>>>> it.
>>>>>>>>>>>> Maybe have to add suspend/resume for it.like below:
>>>>>>>>>>>> static const struct bus_type faux_bus_type = {
>>>>>>>>>>>> .name = "faux",
>>>>>>>>>>>> .match = faux_match,
>>>>>>>>>>>> .probe = faux_probe,
>>>>>>>>>>>> .remove = faux_remove,
>>>>>>>>>>>> .resume = faux_resume,
>>>>>>>>>>>> .suspend = faux_suspend,
>>>>>>>>>>>> };
>>>>>>>>>>>>
>>>>>>>>>>>> Is that right?
>>>>>>>>>>>> Your expertise would be greatly valued.
>>>>>>>>>>> As this is not real hardware, why do you need the suspend/resume
>>>>>>>>>>> callbacks at all? What is happening here that requires them?
>>>>>>>>>> @greg,
>>>>>>>>>> The vhci_hcd_suspend/vhci_hcd_resume interfaces are not designed for this
>>>>>>>>>> faux device, but rather to
>>>>>>>>>> manipulate the HCD_FLAG_HW_ACCESSIBLE bit in the hcd flags associated with
>>>>>>>>>> the faux device.
>>>>>>>>>> For example:
>>>>>>>>>> During system standby: clear_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags)
>>>>>>>>>> During system wakeup: set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags)
>>>>>>>>>>
>>>>>>>>>> Previously, these two functions were registered through platform_driver,
>>>>>>>>>> but faux bus does not have the relevant interface, so they were not called,
>>>>>>>>>> resulting in this compilation warning error.
>>>>>>>>>>
>>>>>>>>>> This raises the question: Should the faux bus implement PM-related
>>>>>>>>>> interface?
>>>>>>>>>> I'm uncertain whether these functions are essential for the vhci-hcd driver
>>>>>>>>>> or if they can be safely removed.
>>>>>>>>>>
>>>>>>>>>> However, during system standby/wakeup tests with remote USB devices bound to
>>>>>>>>>> the vhci-hcd driver,
>>>>>>>>>> I observed consistent failure scenarios across both the original platform
>>>>>>>>>> bus and faux bus patch implementations.
>>>>>>>>
>>>>>>>> suspend and resume hooks have been in the code from beginning
>>>>>>>> in the CONFIG_PM path. The original authors are skeptical about
>>>>>>>> what should happen during suspend"
>>>>>>>>
>>>>>>>> /* what should happen for USB/IP under suspend/resume? */
>>>>>>>> suspend hook checks for active connections and sends EBBUSY
>>>>>>>> back to pm core.
>>>>>>>>
>>>>>>>> Active connection means any of the ports are in USB_PORT_STAT_CONNECTION
>>>>>>>> state. So as long as there are active connections between vhci client
>>>>>>>> and the host, suspend won't work. So we really don't know what happens
>>>>>>>> to the active connections if there are no suspend/resume hooks.
>>>>>>>>
>>>>>>>> If there are no active connections, then it will clear the HCD_FLAG_HW_ACCESSIBLE
>>>>>>>> bit and returns to pm core to continue with suspend.
>>>>>>>>
>>>>>>>> resume sets the HCD_FLAG_HW_ACCESSIBLE flag to indicate hardware is accessible
>>>>>>>> and initiates port status poll.
>>>>>>>>
>>>>>>>> - suspend hook prevents suspend
>>>>>>>>
>>>>>>>> With faux device since there is no suspend and resume hook, I would expect
>>>>>>>> these hooks are not a factor in suspend and resume.
>>>>>>>>
>>>>>>>> vhci_hcd is a special case virtual driver as it is a proxy for controlling
>>>>>>>> hardware on the host.
>>>>>>>>
>>>>>>>> Zongmin,
>>>>>>>>
>>>>>>>> Do suspend/resume work when vhci_hcd is not loaded?
>>>>>>>> What happens when when system suspends and resumes with faux device?
>>>>>>>> Can you send me dmesg logs and pm logs?
>>>>>>>>
>>>>>>> Hi Greg,shuah
>>>>>>>
>>>>>>> Yes, system with vhci_hcd unload or just load vhci_hcd driver
>>>>>>> without usbip devices bound to vhci_hcd,system suspend/resume worked well.
>>>>>>> Some logs for you.
>>>>>>
>>>>>> Sorry for the delay. I was at a conference last week.
>>>>>> Thank you for sending these logs.
>>>>>>
>>>>>>> 1.system with vhci_hcd driver load,but don't bound any usbip devices to vhci_hcd,suspend/resume worked well.
>>>>>>> see dmesg-faux bus-load.log
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2.usbip device bound to vhci_hcd,and do system suspend action,suspend/resume worked failed.
>>>>>>> I observed two distinct failure scenario:
>>>>>>> Scenario 1: System failed to enter suspend state,and will auto resume;
>>>>>>
>>>>>>
>>>>>>> the log for use platform bus:
>>>>>>> dmesg-platform bus-device bound-auto resume.log
>>>>>>
>>>>>> This is clear from the suspend hook in the vhci_hcd.
>>>>>> When it returns EBUSY, suspend will fail and move to
>>>>>> auto resume.
>>>>>>
>>>>>>> the log for use faux bus:
>>>>>>> dmesg-faux bus-device bound-auto resume.log
>>>>>>
>>>>>> It is good that the behavior is same with faux bus even though
>>>>>> suspend hook isn't called. I will take a look at the log.
>>>>>>
>>>>>>>
>>>>>>> Scenario 2: System resume failed with black screen freeze,a forced restart of the machine is require.
>>>>>>> the log for use platform bus:
>>>>>>> dmesg-platform bus-device bound-black screen.log
>>>>>>> the log for use faux bus:
>>>>>>> dmesg-faux bus-device bound-black screen.log
>>>>>>
>>>>>> That is bad. When does this happen? Is there a difference in
>>>>>> configuration?
>>>>>>
>>>>> No, there's no difference. The same code is used for two different failure scenarios.
>>>>> I just tested many times. These two different situations occur probabilistically.
>>>>> But they both happened only when the USBIP device is bound to the vhci_hcd driver.
>>>>>> The behavior same for platform bus and faux bus. Sounds like
>>>>>> an existing problem that needs to be debugged separately.
>>>>>>
>>>>>> I will take a look at all these logs and get back to you in
>>>>>> a day or two.
>>>>>>
>>>>
>>>> I looked at the logs and here is what I found:
>>>>
>>>> Scenario 1:
>>>> dmesg-faux bus-device bound-auto resume.log
>>>> dmesg-platform bus-device bound-auto resume.log
>>>>
>>>> In this case suspend bailed out way before driver suspend
>>>> when vhci_hcd is using platform and faux bus.
>>>>
>>>> Freezing remaining freezable tasks failed after 20.006 seconds (0 tasks refusing to freeze, wq_busy=1)
>>>> Restarting kernel threads ... done
>>>> OOM killer enabled.
>>>> Restarting tasks ... done.
>>>> random: crng reseeded on system resumption
>>>> PM: suspend exit
>>>>
>>>> Auto-resume of the user-space worked. Scenario 1 isn't really
>>>> interesting.
>>>>
>>>> Scenario 2:
>>>> dmesg-faux bus-device bound-black screen.log
>>>> dmesg-platform bus-device bound-black screen.log
>>>>
>>>> Even though the result is the same in seeing blank screen, how
>>>> we get there is different.
>>>>
>>>> =================
>>>> faux-bus device:
>>>> =================
>>>> - suspend worked - looks like
>>>> usb 4-1: PM: calling usb_dev_suspend @ 6054, parent: usb4
>>>> usb 4-1: PM: usb_dev_suspend returned 0 after 13675 usecs
>>>> usb usb4: PM: calling usb_dev_suspend @ 6055, parent: vhci_hcd.0
>>>> usb usb4: PM: usb_dev_suspend returned 0 after 9 usecs
>>>>
>>>> vhci_hcd suspend isn't in play here. usb_dev_suspend() returns.
>>>> See below
>>>>
>>>> usb 4-1: PM: usb_dev_suspend returned message.
>>>>
>>>> -------------------------------------------------------------
>>>>
>>>> - resume started (assuming it has been initiated by user)
>>>>
>>>> [ 650.027491][ T6056] pci 0000:00:01.0: PM: pci_pm_suspend_noirq returned 0 after 304 usecs
>>>>
>>>> See see timestamp difference between the last suspend message and the
>>>> first resume message.
>>>> [ 674.000257][ T39] pci 0000:00:00.0: PM: calling pci_pm_resume_noirq @ 39, parent: pci0000:00
>>>>
>>>> usb 4-1 usb_dev_resume never returns.
>>>>
>>>> [ 674.071125][ T6117] usb usb4: PM: usb_dev_resume returned 0 after 21110 usecs
>>>> [ 674.113991][ T6126] usb 4-1: PM: calling usb_dev_resume @ 6126, parent: usb4
>>>>
>>>> -------------------------------------------------------------
>>>>
>>>> =====================
>>>> platform bus device
>>>> =====================
>>>>
>>>> - suspend was aborted because vhci_hcd suspend routine returned error
>>>>
>>>> [ 297.854621][ T9402] usb 4-1: PM: calling usb_dev_suspend @ 9402, parent: usb4
>>>> [ 297.868524][ T9402] usb 4-1: PM: usb_dev_suspend returned 0 after 13214 usecs
>>>> [ 297.869994][ T9403] usb usb4: PM: calling usb_dev_suspend @ 9403, parent: vhci_hcd.0
>>>> [ 297.871959][ T9403] usb usb4: PM: usb_dev_suspend returned 0 after 30 usecs
>>>> [ 297.873644][ T9394] vhci_hcd vhci_hcd.0: PM: calling platform_pm_suspend @ 9394, parent: platform
>>>> [ 297.874738][ T9394] vhci_hcd vhci_hcd.0: We have 1 active connection. Do not suspend.
>>>> [ 297.875369][ T9394] vhci_hcd vhci_hcd.0: PM: dpm_run_callback(): platform_pm_suspend returns -16
>>>> [ 297.876078][ T9394] vhci_hcd vhci_hcd.0: PM: platform_pm_suspend returned -16 after 1341 usecs
>>>> [ 297.876774][ T9394] vhci_hcd vhci_hcd.0: PM: failed to suspend: error -16
>>>>
>>>> - the following triggers resume
>>>> [ 297.877321][ T9394] PM: Some devices failed to suspend, or early wake event detected
>>>>
>>>> [ 297.881065][ T9403] usb usb3: PM: usb_dev_resume returned 0 after 19 usecs
>>>> [ 297.904551][ T9408] usb usb4: PM: usb_dev_resume returned 0 after 22911 usecs
>>>> [ 297.905148][ T9418] usb 4-1: PM: calling usb_dev_resume @ 9418, parent: usb4
>>>>
>>>> usb 4-1 usb_dev_resume never returns.
>>>>
>>>> Note - In both cases, usb_dev_resume doesn't return. When it is called is the
>>>> difference.
>>>>
>>>> I don't think suspend/resume works when devices are bound. Suspend exits and
>>>> starts resume which seems to fail because it doesn't handle the virtual usb
>>>> device resume. There is a missing piece here.
>>>>
>>>> When vhci_hcd imports a device and acts as a proxy between the virtual mass
>>>> storage device (e.g in this case) - it appears suspend and resume are
>>>> handled as if it is a usb device. Maybe this is incorrect?
>>>>
>>>> usb_dev_suspend() works and usb_dev_resume() on these virtual usb devices?
>>>> Do we need to handle this in usb_dev_resume()?
>>>>
>>>> Talking out loud - I have to do some looking into.
>>>>
>>> Re:
>>> Yes, your analysis is completely correct.
>>>
>>> In fact, I've experimented with adding PM hooks to the faux bus,
>>> and found that faux bus devices then behave identically to platform bus devices during suspend/resume.
>>> See the attachment.
>>>
>>
>> Thanks for checking this scenario. No surprises here.
> Another part of my purpose in doing this is that the vhci-hcd driver seems
> should still retain suspend/resume hooks. Therefore, the faux bus should
> add corresponding hooks to allow the driver to call its own pm functions.
> Though currently don't know how to fix this problem yet.
>>
>>> This is likely a historical legacy issue.
>>
>> It is an existing problem in the way vhci_hcd and the bound devices
>> handle suspend/resume. vhci_hcd suspend assumes once it returns
>> "don't suspend" the rest works. However suspend for the bound device
>> runs first and a subsequent resume on it fails.
>>
>> This problem needs fixing - I don't know yet how to.
>>
>>> Regarding this matter, is there anything else you'd like me to assist with?
>>>
>>
>> One thing I am curious about is the status of the bound devices after
>> "forced restart of the machine" when you see blank screen or hang?
> That's an excellent question.Another persistent problem has surfaced during troubleshooting:
> After a USB/IP device is bound to the vhci-hcd driver, when the machine
> reboots or performs a forced restart(include resume fail,forced restart),
> attempting to re-bind the device to the vhci-hcd driver will generate the error message
> "usbip: error: Attach Request for 4-2 failed - Device busy (exported)".
> At this point, the device must first be explicitly detached
> from the usbip-host driver before it can be bound again.
The error is coming from the usbip host (server) which would have
stayed up during client side reboot.
It requires an extra detach step before the attach. vhci client
doesn't do detach on it own. Attach and detach are user initiated.
So the current behavior is inline with that.
I don't dislike the shutdown use which essentially mimics
a series of "detach" commands from the user-space. However the
solution that works in conjunction with the usbip host side.
usbip_host - stub driver doesn't implement suspend/resume.
> I implemented a shutdown hook (can refer to the attached file) in the vhci-hcd driver
> to perform vhci_hcd remove before system shutdown.
> This resolves failures where devices cannot rebind to
> the vhci-hcd driver after normal reboots. However,
> it remains ineffective for forced shutdown/reboot scenarios
> since no shutdown functions are executed in such cases.
The problem here is that current vhci_suspend/resume are ineffective
usb core initiates usb_dev_suspend on the virtual device.
The solution could be to disable auto-suspend on virtual devices.
This would eliminate the suspend and resume hanging issue.
There is the server side issue - what happens when usbip host
suspends? We have a few components here to think about.
usbipd (user-space)
vhci_hcd and the usb devices it creates
usbip_host, stub driver that proxies between the device
on the server and vhci_client.
PM can be complex and it has to do lot more than it currently
does on both server and client end to support seamlessly.
thanks,
-- Shuah
Powered by blists - more mailing lists