linux-kernel - Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1511283197.2692.13.camel@wdc.com>
Date:   Tue, 21 Nov 2017 16:53:18 +0000
From:   Bart Van Assche <Bart.VanAssche@....com>
To:     "jejb@...ux.vnet.ibm.com" <jejb@...ux.vnet.ibm.com>,
        "ptikhomirov@...tuozzo.com" <ptikhomirov@...tuozzo.com>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>
CC:     "hch@....de" <hch@....de>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "devel@...nvz.org" <devel@...nvz.org>,
        "khorenko@...tuozzo.com" <khorenko@...tuozzo.com>
Subject: Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after
 decrementing host_busy

On Tue, 2017-09-05 at 15:54 +0300, Pavel Tikhomirov wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..6c99221d60aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>  	if (starget->can_queue > 0)
>  		atomic_dec(&starget->target_busy);
>  
> +	spin_lock_irqsave(shost->host_lock, flags);
>  	if (unlikely(scsi_host_in_recovery(shost) &&
> -		     (shost->host_failed || shost->host_eh_scheduled))) {
> -		spin_lock_irqsave(shost->host_lock, flags);
> +		     (shost->host_failed || shost->host_eh_scheduled)))
>  		scsi_eh_wakeup(shost);
> -		spin_unlock_irqrestore(shost->host_lock, flags);
> -	}
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>  
>  	atomic_dec(&sdev->device_busy);
>  }
> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  	spin_unlock_irq(shost->host_lock);
>  out_dec:
>  	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>  	return 0;
>  }
>  
> @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  
>  out_dec_host_busy:
>  	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>  out_dec_target_busy:
>  	if (scsi_target(sdev)->can_queue > 0)
>  		atomic_dec(&scsi_target(sdev)->target_busy);

An important achievement of the scsi-mq code was removal of all
spin_lock_irq(shost->host_lock) statements from the hot path. The above
changes will have a significant negative performance impact, especially if
multiple LUNs associated with the same SCSI host are involved. Can the
reported race be fixed without slowing down the hot path significantly? I
think that both adding spin lock or smp_mb() calls in the hot path will
have a significant negative performance impact.

Thanks,

Bart.