linux-kernel - Re: [BUG] Oops when SCSI device under multipath is removed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 16 Aug 2011 20:26:40 +0900
From:	"Jun'ichi Nomura" <j-nomura@...jp.nec.com>
To:	Alan Stern <stern@...land.harvard.edu>,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Tejun Heo <tj@...nel.org>
CC:	jaxboe@...ionio.com, roland@...estorage.com,
	linux-scsi@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	device-mapper development <dm-devel@...hat.com>,
	Kiyoshi Ueda <k-ueda@...jp.nec.com>
Subject: Re: [BUG] Oops when SCSI device under multipath is removed

Hi,

On 08/12/11 00:16, Alan Stern wrote:
> On Thu, 11 Aug 2011, James Bottomley wrote:
>> However, much as I'd like to accept this rosy view, the original oops
>> that started all of this in 2.6.38 was someone caught something with a
>> reference to a SCSI queue after the device release function had been
>> called.
> 
> Not according to your commit log.  You wrote that the reference was
> taken after scsi_remove_device() had been called -- but the device
> release function is scsi_device_dev_release_usercontext().

The commit log of 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
("[SCSI] put stricter guards on queue dead checks") does not
explain about the move of scsi_free_queue().

But according to the discussion below, it seems
the move was motivated to solve the following self-deadlock:
https://lkml.org/lkml/2011/4/12/9

  [in the context of kblockd_workqueue]
  blk_delay_work
    __blk_run_queue
      scsi_request_fn
        put_device
          (puts final sdev refcount)
             scsi_device_dev_release
               execute_in_process_context(scsi_device_dev_release_usercontext)
                 [execute immediately because it's in process context]
                    scsi_device_dev_release_usercontext
                      scsi_free_queue
                        blk_cleanup_queue
                          blk_sync_queue
                            (wait for blk_delay_work to complete...)

James, is my understanding correct?

If so, isn't it possible to move the scsi_free_queue back to
the original place and solve the deadlock instead by
avoiding the wait in the same context?

@@ -338,8 +339,8 @@ static void scsi_device_dev_release_user
 static void scsi_device_dev_release(struct device *dev)
 {
 	struct scsi_device *sdp = to_scsi_device(dev);
-	execute_in_process_context(scsi_device_dev_release_usercontext,
-				   &sdp->ew);
+	INIT_WORK(&sdp->ew.work, scsi_device_dev_release_usercontext);
+	schedule_work(&sdp->ew.work);
 }
 
 static struct class sdev_class = {

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/