lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 4 Apr 2019 03:57:31 +0000
From:   <Kento.A.Kobayashi@...y.com>
To:     <stern@...land.harvard.edu>
CC:     <oneukum@...e.com>, <gregkh@...uxfoundation.org>,
        <usb-storage@...ts.one-eyed-alien.net>, <Jacky.Cao@...y.com>,
        <linux-kernel@...r.kernel.org>, <linux-scsi@...r.kernel.org>,
        <linux-usb@...r.kernel.org>, <Kento.A.Kobayashi@...y.com>
Subject: RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub
 port

Hi,

>> Root Cause
>> - Block layer timeout happens after power off UAS USB device which is accessed as reproduce step. During timeout error handler process, scsi host state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot be released. And in final, usb subsystem hangs up.
>> Follow is function call:
>> blk_mq_timeout_work 
>>   …->scsi_times_out  (… means some functions are not listed before this function.)
>>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>>       … -> scsi_error_handler
>>         …-> uas_eh_device_reset_handler
>>             -> usb_lock_device_for_reset  <- take lock
>>               -> usb_reset_device
>>                 …-> rebind = uas_post_reset (return 1 since ENODEV) 
>>                 …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
>>                    …-> uas_disconnect  (scsi_host_set_state to SHOST_CANCEL_RECOVERY)
>>                         … -> scsi_queue_rq
>
>How does scsi_queue_rq get called here?  As far as I can see, this shouldn't happen.

We confirmed the function call path on linux 4.9 when this problem occured since we are working on it. In linux 4.9, the last function is scsi_request_fn instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called by follow path.
uas_disconnect
|- scsi_remove_host
 |- scsi_forget_host
  |- __scsi_remove_device
   |- device_del
    |- bus_remove_device
     |- device_release_driver
      |- device_release_driver_internal
       |- __device_release_driver
        |- drv->remove(dev) (sd_remove)  
         |- sd_shutdown
          |- sd_sync_cache
           |- scsi_execute
            |- __scsi_execute
             |- blk_execute_rq
              |- blk_execute_rq_nowait
               |- blk_mq_sched_insert_request
                |- blk_mq_run_hw_queue
                 |- __blk_mq_delay_run_hw_queue
                  |- __blk_mq_run_hw_queue
                   |- blk_mq_sched_dispatch_requests
                    |- blk_mq_dispatch_rq_list
                     |- q->mq_ops->queue_rq (scsi_queue_rq)

>> Countermeasure
>> - Make uas_post_reset doesn’t return 1 when ENODEV returns from uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces doesn’t need to do unbind/rebind operations in this situation.
>> blk_mq_timeout_work
>>   …->scsi_times_out  (… means some functions are not listed before this function.)
>>     …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>>       … -> scsi_error_handler
>>        …-> uas_eh_device_reset_handler (*1)
>>            -> usb_lock_device_for_reset  <- take lock
>>              -> usb_reset_device
>>                -> usb_reset_and_verify_device (return ENODEV and FAILED will be reported to *1)
>>                -> uas_post_reset returns 0 when ENODEV => rebind=0 
>>                -> usb_unbind_and_rebind_marked_interfaces (rebind=0)
>
>The difference is that uas_disconnect wasn't called here.  But that routine should not cause any problems -- you're always supposed to be able to unbind a driver from a device.  So it looks like this is not the right way to solve the problem.

We confirmed usb_driver_release_interface will call usb_unbind_interface when this problem occurs.
So usb_unbind_interface will call driver disconnect callbak.

Regards,
Kento Kobayashi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ