linux-kernel - Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <496d6c6e-565b-9ebb-59b4-d4b17e0d9a62@virtuozzo.com>
Date:   Thu, 9 Nov 2017 17:54:09 +0300
From:   Pavel Tikhomirov <ptikhomirov@...tuozzo.com>
To:     Stuart Hayes <stuart.w.hayes@...il.com>
Cc:     "James E . J . Bottomley" <jejb@...ux.vnet.ibm.com>,
        "Martin K . Petersen" <martin.petersen@...cle.com>,
        Christoph Hellwig <hch@....de>, linux-scsi@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Konstantin Khorenko <khorenko@...tuozzo.com>, devel@...nvz.org
Subject: Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after
 decrementing host_busy

 > Are there any issues with this patch 
(https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov 
submitted back in September?  I am willing to help if there's anything I 
can do to help get it accepted.

Hi, Stuart, I asked James Bottomley about the patch status offlist and 
it seems that the problem is - patch lacks testing and review. I would 
highly appreciate review from your side and anyone who wants to participate!

And if you can confirm that the patch solves the problem on your 
environment with no side effects please add "Tested-by:" tag also.

Thanks, Pavel

On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
> We have a problem on several our nodes with scsi EH. Imagine such an
> order of execution of two threads:
> 
> CPU1 scsi_eh_scmd_add		CPU2 scsi_host_queue_ready
> /* shost->host_busy == 1 initialy */
> 
> 				if (shost->shost_state == SHOST_RECOVERY)
> 					/* does not get here */
> 					return 0;
> 
> lock(shost->host_lock);
> shost->shost_state = SHOST_RECOVERY;
> 
> 				busy = shost->host_busy++;
> 				/* host->can_queue == 1 initialy, busy == 1
> 				 * - go to starved label */
> 				lock(shost->host_lock) /* wait */
> 
> shost->host_failed++;
> /* shost->host_busy == 2, shost->host_failed == 1 */
> call scsi_eh_wakeup(shost) {
> 	if (host_busy == host_failed) {
> 		/* does not get here */
> 		wake_up_process(shost->ehandler)
> 	}
> }
> unlock(shost->host_lock)
> 
> 				/* acquire lock */
> 				shost->host_busy--;
> 
> Finaly we do not wakeup scsi_error_handler and all other commands
> coming will hang as we are in never ending recovery state as there
> is no one left to wakeup handler.
> 
> So scsi disc in these host becomes unresponsive and all bio on node
> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
> 
> Main idea of the fix is to try to do wake up every time we decrement
> host_busy or increment host_failed(the latter is already OK).
> 
> Now the very *last* one of busy threads getting host_lock after
> decrementing host_busy will see all write operations on host's
> shost_state, host_busy and host_failed completed thanks to implied
> memory barriers on spin_lock/unlock, so at the time of busy==failed
> we will trigger wakeup in at least one thread. (Thats why putting
> recovery and failed checks under lock)
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@...tuozzo.com>
> ---
>   drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
>   1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..6c99221d60aa 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
>   	if (starget->can_queue > 0)
>   		atomic_dec(&starget->target_busy);
>   
> +	spin_lock_irqsave(shost->host_lock, flags);
>   	if (unlikely(scsi_host_in_recovery(shost) &&
> -		     (shost->host_failed || shost->host_eh_scheduled))) {
> -		spin_lock_irqsave(shost->host_lock, flags);
> +		     (shost->host_failed || shost->host_eh_scheduled)))
>   		scsi_eh_wakeup(shost);
> -		spin_unlock_irqrestore(shost->host_lock, flags);
> -	}
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>   
>   	atomic_dec(&sdev->device_busy);
>   }
> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>   	spin_unlock_irq(shost->host_lock);
>   out_dec:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   	return 0;
>   }
>   
> @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>   
>   out_dec_host_busy:
>   	atomic_dec(&shost->host_busy);
> +
> +	spin_lock_irq(shost->host_lock);
> +	if (unlikely(scsi_host_in_recovery(shost) &&
> +		     (shost->host_failed || shost->host_eh_scheduled)))
> +		scsi_eh_wakeup(shost);
> +	spin_unlock_irq(shost->host_lock);
> +
>   out_dec_target_busy:
>   	if (scsi_target(sdev)->can_queue > 0)
>   		atomic_dec(&scsi_target(sdev)->target_busy);
> 

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.