lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251223110621.2b53f63d.michal.pecio@gmail.com>
Date: Tue, 23 Dec 2025 11:06:21 +0100
From: Michal Pecio <michal.pecio@...il.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: 胡连勤 <hulianqin@...o.com>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, Lee Jones <lee@...nel.org>, Mathias Nyman
 <mathias.nyman@...ux.intel.com>, "linux-usb@...r.kernel.org"
 <linux-usb@...r.kernel.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] usb: xhci: check Null pointer in segment alloc

On Mon, 22 Dec 2025 22:24:35 -0500, Alan Stern wrote:
> > I see that devices recursively call bus_resume() before resuming,
> 
> Are you talking about hcd_bus_resume()?  (There is no function named 
> bus_resume() in usbcore.)  That's the routine in charge of resuming
> root hubs.  The PM core ensures that all of a device's ancestors are
> at full power before the device is resumed, so it would (indirectly)
> call this routine if an entire USB bus was suspended when a resume
> was requested for one of the devices on that bus.

Yes, I mean this function and the bus_resume() method of the HCD,
which the function calls.

> I can't see it being an autoresume in that situation, though.  An 
> autoresume is one that was requested by the device itself -- a wakeup 
> request -- and that can't happen if the HC is suspended.

Indeed, the crashing call stack looks like some driver manually
resuming a USB device.

[ 4021.987702][  T332]  usb_hcd_alloc_bandwidth+0x384/0x3e4
[ 4021.987711][  T332]  usb_set_interface+0x144/0x510
[ 4021.987716][  T332]  usb_reset_and_verify_device+0x248/0x5fc
[ 4021.987723][  T332]  usb_port_resume+0x580/0x700
[ 4021.987730][  T332]  usb_generic_driver_resume+0x24/0x5c
[ 4021.987735][  T332]  usb_resume_both+0x104/0x32c
[ 4021.987740][  T332]  usb_runtime_resume+0x18/0x28
[ 4021.987746][  T332]  __rpm_callback+0x94/0x3d4
[ 4021.987754][  T332]  rpm_resume+0x3f8/0x5fc
[ 4021.987762][  T332]  rpm_resume+0x1fc/0x5fc
[ 4021.987769][  T332]  __pm_runtime_resume+0x4c/0x90
[ 4021.987777][  T332]  usb_autopm_get_interface+0x20/0x4c
[ 4021.987783][  T332]  snd_usb_autoresume+0x68/0x124
[ 4021.987792][  T332]  suspend_resume_store+0x2a0/0x2b4 [dwc3_msm a4b7997a2e35cfe1a4a429762003b34dd4e85076]

Before we got here, we should have attempted an hcd_bus_resume().
If xhci_hcd tracked its HW_ACCESSIBLE state better, that would have
failed and hopefully aborted device resume before it crashed.

> > this fails with -ESHUTDOWN if the flag is unset, which seems to
> > prevent device resume from progressing further and crashing. Is
> > this what is meant to happen in such case?  
> 
> I'm not sure what code in what routine you're talking about.
> However, it's obvious that if the host controller's registers can't
> be accessed then no downstream device resume is going to work.

If HW_ACCESSIBLE isn't set then xhci_bus_resume() returns -ESHUTDOWN.
It always returns zero otherwise.

So in the light of your explanation, the fact that xhci_resume() sets
HW_ACCESSIBLE before actually completing resume and thus allows root
hub resume to pretend to work, is obviously a bug.

> > So I guess it's not happening because xhci_resume() sets this flag
> > right away and then it may drop the lock and start deallocating
> > memory to reset everything. So we can "successfully" complete
> > bus_resume() and allow USB devices to resume while HC resume is
> > still in progress.  
> 
> The root-hub resume (i.e., the ->bus_resume() callback) does not
> occur until after the HC's own resume handler returns.

If PM core is supposed to prevent it then this is getting weird.
If not, then I'm not sure what else can prevent it.

> Is it possible that the HC's resume handler decided that the HC was 
> dead, and so started deallocating stuff, but failed to call 
> usb_hc_died()?  (But note that resume_common() in hcd-pci.c calls 
> usb_hc_died() automatically on the HCD's behalf when a resume fails.)

Hopefully not.

xhci->segment_pool is only modified by xhci_mem_cleanup() and by
xhci_mem_init() if allocation fails. And those functions are only
called at probe time, during HC resume and by hc_driver->stop().

I'm out of ideas without more logs. The xhci HW_ACCESSIBLE bug should
be fixed, but I'm not sure about correct ordering of setting this bit
wrt some calls done by xhci_resume(), so no patch from me.

Regards,
Michal



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ