[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce74e3b4-ec01-4d99-9080-41dc15a13977@rowland.harvard.edu>
Date: Tue, 23 Dec 2025 13:37:41 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Michal Pecio <michal.pecio@...il.com>
Cc: 胡连勤 <hulianqin@...o.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Lee Jones <lee@...nel.org>,
Mathias Nyman <mathias.nyman@...ux.intel.com>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] usb: xhci: check Null pointer in segment alloc
On Tue, Dec 23, 2025 at 11:06:21AM +0100, Michal Pecio wrote:
> Indeed, the crashing call stack looks like some driver manually
> resuming a USB device.
>
> [ 4021.987702][ T332] usb_hcd_alloc_bandwidth+0x384/0x3e4
> [ 4021.987711][ T332] usb_set_interface+0x144/0x510
> [ 4021.987716][ T332] usb_reset_and_verify_device+0x248/0x5fc
> [ 4021.987723][ T332] usb_port_resume+0x580/0x700
> [ 4021.987730][ T332] usb_generic_driver_resume+0x24/0x5c
> [ 4021.987735][ T332] usb_resume_both+0x104/0x32c
> [ 4021.987740][ T332] usb_runtime_resume+0x18/0x28
> [ 4021.987746][ T332] __rpm_callback+0x94/0x3d4
> [ 4021.987754][ T332] rpm_resume+0x3f8/0x5fc
> [ 4021.987762][ T332] rpm_resume+0x1fc/0x5fc
rpm_resume() is in the PM core. So this isn't just a USB thing.
> [ 4021.987769][ T332] __pm_runtime_resume+0x4c/0x90
> [ 4021.987777][ T332] usb_autopm_get_interface+0x20/0x4c
> [ 4021.987783][ T332] snd_usb_autoresume+0x68/0x124
> [ 4021.987792][ T332] suspend_resume_store+0x2a0/0x2b4 [dwc3_msm a4b7997a2e35cfe1a4a429762003b34dd4e85076]
>
> Before we got here, we should have attempted an hcd_bus_resume().
> If xhci_hcd tracked its HW_ACCESSIBLE state better, that would have
> failed and hopefully aborted device resume before it crashed.
The reason we didn't is because the PM core thought the HC and root hub
were already at full power. Possibly because they were resumed before
the start of the log, or possibly because they were never suspended.
We really need to know what happened leading up to this crash.
> If HW_ACCESSIBLE isn't set then xhci_bus_resume() returns -ESHUTDOWN.
> It always returns zero otherwise.
>
> So in the light of your explanation, the fact that xhci_resume() sets
> HW_ACCESSIBLE before actually completing resume and thus allows root
> hub resume to pretend to work, is obviously a bug.
No, not really. The proper time to set HW_ACCESSIBLE is when it becomes
possible to do I/O to the HC's registers, i.e., when the controller
changes from D3 to D0 (and maybe a few other things like
pci_set_master() have been done). By the time xhci_resume() gets called
this should already have happened, so setting the flag immediately is
the right thing for it to do.
> xhci->segment_pool is only modified by xhci_mem_cleanup() and by
> xhci_mem_init() if allocation fails. And those functions are only
> called at probe time, during HC resume and by hc_driver->stop().
>
> I'm out of ideas without more logs. The xhci HW_ACCESSIBLE bug should
> be fixed, but I'm not sure about correct ordering of setting this bit
> wrt some calls done by xhci_resume(), so no patch from me.
Agreed, we can't do anything without more and better logs. Adding
dev_info() lines to the start and end of the various xhci-hcd suspend
and resume routines, as well as xhci_mem_cleanup() and xhci_mem_init()
and whatever else you can think of, would be a good start.
Can you write a patch that does this, and can 胡连勤 test it?
Alan Stern
Powered by blists - more mailing lists