lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Dec 2023 16:58:25 +0200
From:   Mathias Nyman <mathias.nyman@...ux.intel.com>
To:     Yaxiong Tian <iambestgod@...look.com>, mathias.nyman@...el.com,
        gregkh@...uxfoundation.org
Cc:     linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
        tianyaxiong@...inos.cn
Subject: Re: [PATCH] usb:xhci: Avoid hub_event() stuck when xHC restore state
 timeout

On 4.12.2023 10.02, Yaxiong Tian wrote:
> From: Yaxiong Tian <tianyaxiong@...inos.cn>
> 
> when xHc restore state timeout,the xhci_reusme() return -ETIMEDOUT

Out of curiosity, have you tried if it still is possible to revive your
xHC controller here?

Instead of returning -ETIMEDOUT, try setting " reinit_xhc = true", and
jump to  "if (reinit_xhc) {"  where we reinitialize xHC in xhci_resume() due
to other resume issues.

> instantly. After usb_hc_died() called ,they kick hub_wq to running
> hub_event() but the wq is freezd. When suspend ends,hub_evnet realy
> running and sticking.
> Such as:
> [  968.794016][ 2] [   T37] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  968.802969][ 2] [   T37] kworker/2:3     D    0   999      2 0x00000028
> [  968.809579][ 2] [   T37] Workqueue: usb_hub_wq hub_event
> [  968.814885][ 2] [   T37] Call trace:
> [  968.818455][ 2] [   T37]  __switch_to+0xd4/0x138
> [  968.823067][ 2] [   T37]  __schedule+0x2dc/0x6a0
> [  968.827680][ 2] [   T37]  schedule+0x34/0xb0
> [  968.831947][ 2] [   T37]  schedule_timeout+0x1e0/0x298
> [  968.837079][ 2] [   T37]  __wait_for_common+0xf0/0x208
> [  968.842212][ 2] [   T37]  wait_for_completion+0x1c/0x28
> [  968.847432][ 2] [   T37]  xhci_configure_endpoint+0x104/0x640
> [  968.853173][ 2] [   T37]  xhci_check_bandwidth+0x140/0x2e0
> [  968.858652][ 2] [   T37]  usb_hcd_alloc_bandwidth+0x1c8/0x348
> [  968.864393][ 2] [   T37]  usb_disable_device+0x198/0x260
> [  968.869698][ 2] [   T37]  usb_disconnect+0xdc/0x3a0
> [  968.874571][ 2] [   T37]  usb_disconnect+0xbc/0x3a0
> [  968.879441][ 2] [   T37]  hub_quiesce+0xa0/0x108
> [  968.884053][ 2] [   T37]  hub_event+0x4d4/0x1558
> [  968.888664][ 2] [   T37]  kretprobe_trampoline+0x0/0xc4
> [  968.893884][ 2] [   T37]  worker_thread+0x4c/0x488
> [  968.898668][ 2] [   T37]  kthread+0xf8/0x128
> [  968.902933][ 2] [   T37]  ret_from_fork+0x10/0x18
> 
> The result is that you cannot suspend again.because the wq can't
> be freezed.Also hard to reboot,when some application visited this
> piece.
> 
> The reason of stuck is that some access related to xhci hardware
> is being called.But xhci has problem,at least not running.(
> when xhci_restore_registers(),the xhci will load op_regs.The
> CMD_RUN will clear in xhci_suspend().)
> 

Nice catch and debugging work btw.

> So using XHCI_STATE_DYING flag,to avoid any code to touching
> hardware immediately.hub_event() will complete.The usb_hc_died
> tasks will be completed and some sys interfaces will be removed.

The XHCI_STATE_DYING flag is currently only set in xhci_hc_died().
So when this flag is set we could assume that the command ring and
pending URBs are, or will be cleaned up. This would change with your patch.
  
We might need some other solution,
Maybe set the set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags) after
CNR (Contorller Not Ready) is successfully cleared, and controller is
actually accessible.

We then would need to add checks to see if controller is accessible
before queuing any commands to xHC hardware.

Thanks
Mathias

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ