linux-kernel - Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <525e4f67-f471-54a6-aaea-b3772a550af1@acm.org>
Date:   Sat, 25 Jan 2020 19:29:55 -0800
From:   Bart Van Assche <bvanassche@....org>
To:     Can Guo <cang@...eaurora.org>, asutoshd@...eaurora.org,
        nguyenb@...eaurora.org, hongwus@...eaurora.org,
        rnayak@...eaurora.org, linux-scsi@...r.kernel.org,
        kernel-team@...roid.com, saravanak@...gle.com, salyzyn@...gle.com
Cc:     Sayali Lokhande <sayalil@...eaurora.org>,
        Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        Pedro Sousa <pedrom.sousa@...opsys.com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Stanley Chu <stanley.chu@...iatek.com>,
        Bean Huo <beanhuo@...ron.com>,
        Venkat Gopalakrishnan <venkatg@...eaurora.org>,
        Tomas Winkler <tomas.winkler@...el.com>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend

On 2020-01-22 23:25, Can Guo wrote:
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 1201578..c2de29f 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -4760,8 +4760,15 @@ static void ufshcd_slave_destroy(struct scsi_device *sdev)
>  			 * UFS device needs urgent BKOPs.
>  			 */
>  			if (!hba->pm_op_in_progress &&
> -			    ufshcd_is_exception_event(lrbp->ucd_rsp_ptr))
> -				schedule_work(&hba->eeh_work);
> +			    ufshcd_is_exception_event(lrbp->ucd_rsp_ptr)) {
> +				/*
> +				 * Prevent suspend once eeh_work is scheduled
> +				 * to avoid deadlock between ufshcd_suspend
> +				 * and exception event handler.
> +				 */
> +				if (schedule_work(&hba->eeh_work))
> +					pm_runtime_get_noresume(hba->dev);
> +			}

Please combine the two logical tests with "&&" instead of nesting two
if-statements inside each other.

>  			break;
>  		case UPIU_TRANSACTION_REJECT_UPIU:
>  			/* TODO: handle Reject UPIU Response */
> @@ -5215,7 +5222,14 @@ static void ufshcd_exception_event_handler(struct work_struct *work)
>  
>  out:
>  	scsi_unblock_requests(hba->host);
> -	pm_runtime_put_sync(hba->dev);
> +	/*
> +	 * pm_runtime_get_noresume is called while scheduling
> +	 * eeh_work to avoid suspend racing with exception work.
> +	 * Hence decrement usage counter using pm_runtime_put_noidle
> +	 * to allow suspend on completion of exception event handler.
> +	 */
> +	pm_runtime_put_noidle(hba->dev);
> +	pm_runtime_put(hba->dev);
>  	return;
>  }
>  
> @@ -7901,6 +7915,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
>  			goto enable_gating;
>  	}
>  
> +	flush_work(&hba->eeh_work);
>  	ret = ufshcd_link_state_transition(hba, req_link_state, 1);
>  	if (ret)
>  		goto set_dev_active;

I think this patch introduces a new race condition, namely the following:
- ufshcd_slave_destroy() tests pm_op_in_progress and reads the value
  zero from that variable.
- ufshcd_suspend() sets hba->pm_op_in_progress to one.
- ufshcd_slave_destroy() calls schedule_work().

How about fixing this race condition by calling
pm_runtime_get_noresume() before checking pm_op_in_progress and by
reallowing resume if no work is scheduled?

Thanks,

Bart.