linux-kernel - Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56291DD2.90104@sandisk.com>
Date:	Thu, 22 Oct 2015 10:33:06 -0700
From:	Bart Van Assche <bart.vanassche@...disk.com>
To:	Vitaly Kuznetsov <vkuznets@...hat.com>,
	"James E.J. Bottomley" <JBottomley@...n.com>
CC:	<linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	"K. Y. Srinivasan" <kys@...rosoft.com>
Subject: Re: [PATCH] scsi_sysfs: protect against double execution of
 __scsi_remove_device()

On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote:
> On some host errors storvsc module tries to remove sdev by scheduling a job
> which does the following:
>
>     sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
>     if (sdev) {
>         scsi_remove_device(sdev);
>         scsi_device_put(sdev);
>     }
>
> While this code seems correct the following crash is observed:
>
>   general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
>   RIP: 0010:[<ffffffff81169979>]  [<ffffffff81169979>] bdi_destroy+0x39/0x220
>   ...
>   [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
>   [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
>   [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
>   [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
>   [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
>   [<ffffffff81080791>] process_one_work+0x1b1/0x530
>   ...
>
> The problem comes with the fact that many such jobs (for the same device)
> are being scheduled simultaneously. While scsi_remove_device() uses
> shost->scan_mutex and scsi_device_lookup() will fail for a device in
> SDEV_DEL state there is no protection against someone who did
> scsi_device_lookup() before we actually entered __scsi_remove_device(). So
> the whole scenario looks like that: two callers do simultaneous (or
> preemption happens) calls to scsi_device_lookup() ant these calls succeed
> for all of them, after that both callers try doing scsi_remove_device().
> shost->scan_mutex only serializes their calls to __scsi_remove_device()
> and we end up doing the cleanup path twice.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
> ---
>   drivers/scsi/scsi_sysfs.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index b333389..e0d2707 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
>   {
>   	struct device *dev = &sdev->sdev_gendev;
>
> +	/*
> +	 * This cleanup path is not reentrant and while it is impossible
> +	 * to get a new reference with scsi_device_get() someone can still
> +	 * hold a previously acquired one.
> +	 */
> +	if (sdev->sdev_state == SDEV_DEL)
> +		return;
> +
>   	if (sdev->is_visible) {
>   		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>   			return;

Hello Vitaly,

Sorry but I don't see how the above patch could be a proper fix. If two 
calls to __scsi_remove_device() occur concurrently the crash explained 
above can still occur. The storsvc driver should be modified such that 
concurrent __scsi_remove_device() calls do not occur. How about 
preventing concurrent calls via a mutex ? Another possible approach is 
to use the workqueue mechanism. An example can be found in the SRP 
initiator driver (ib_srp).

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/