linux-kernel - RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 4 Apr 2019 15:33:38 -0400 (EDT)
From:   Alan Stern <stern@...land.harvard.edu>
To:     Kento.A.Kobayashi@...y.com,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
cc:     Oliver Neukum <oneukum@...e.com>, <gregkh@...uxfoundation.org>,
        USB Storage list <usb-storage@...ts.one-eyed-alien.net>,
        <Jacky.Cao@...y.com>,
        Kernel development list <linux-kernel@...r.kernel.org>,
        SCSI development list <linux-scsi@...r.kernel.org>,
        USB list <linux-usb@...r.kernel.org>
Subject: RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub
 port

On Thu, 4 Apr 2019 Kento.A.Kobayashi@...y.com wrote:

> Hi,
> 
> >> Root Cause
> >> - Block layer timeout happens after power off UAS USB device which is accessed as reproduce step. During timeout error handler process, scsi host state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot be released. And in final, usb subsystem hangs up.
> >> Follow is function call:
> >> blk_mq_timeout_work 
> >>   …->scsi_times_out  (… means some functions are not listed before this function.)
> >>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
> >>       … -> scsi_error_handler
> >>         …-> uas_eh_device_reset_handler
> >>             -> usb_lock_device_for_reset  <- take lock
> >>               -> usb_reset_device
> >>                 …-> rebind = uas_post_reset (return 1 since ENODEV) 
> >>                 …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
> >>                    …-> uas_disconnect  (scsi_host_set_state to SHOST_CANCEL_RECOVERY)
> >>                         … -> scsi_queue_rq
  >>                              -> scsi_host_queue_ready(return 0 causes IO hangs up.)
> >
> >How does scsi_queue_rq get called here?  As far as I can see, this shouldn't happen.
> 
> We confirmed the function call path on linux 4.9 when this problem occured since we are working on it. In linux 4.9, the last function is scsi_request_fn instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called by follow path.
> uas_disconnect
> |- scsi_remove_host
>  |- scsi_forget_host
>   |- __scsi_remove_device
>    |- device_del
>     |- bus_remove_device
>      |- device_release_driver
>       |- device_release_driver_internal
>        |- __device_release_driver
>         |- drv->remove(dev) (sd_remove)  
>          |- sd_shutdown
>           |- sd_sync_cache
>            |- scsi_execute
... (unnecessary internal details elided)
>                     |- blk_mq_dispatch_rq_list
>                      |- q->mq_ops->queue_rq (scsi_queue_rq)

So it looks as though the SCSI subsystem doesn't like to have a reset 
handler call scsi_remove_host.  Commands dispatched by the removal 
routines are forced to wait for the reset recovery to finish, which 
won't happen until those commands have been completed.

Is this a bug in the SCSI core?  If not, we need to know what is the
right way to do things when a reset handler detects that the SCSI host
has been hot-unplugged.

James, Martin, any suggestions?

Alan Stern