lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 03 Aug 2020 20:04:39 +0800 From: Can Guo <cang@...eaurora.org> To: Stanley Chu <stanley.chu@...iatek.com> Cc: linux-scsi@...r.kernel.org, martin.petersen@...cle.com, avri.altman@....com, alim.akhtar@...sung.com, jejb@...ux.ibm.com, bvanassche@....org, beanhuo@...ron.com, asutoshd@...eaurora.org, matthias.bgg@...il.com, linux-mediatek@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org, kuohong.wang@...iatek.com, peter.wang@...iatek.com, chun-hung.wu@...iatek.com, andy.teng@...iatek.com, chaotian.jing@...iatek.com, cc.chou@...iatek.com, jiajie.hao@...iatek.com Subject: Re: [PATCH v7] scsi: ufs: Quiesce all scsi devices before shutdown Slightly updated my comments On 2020-08-03 19:50, Can Guo wrote: > Hi Stanley, > > On 2020-08-03 18:04, Stanley Chu wrote: >> Currently I/O request could be still submitted to UFS device while >> UFS is working on shutdown flow. This may lead to racing as below >> scenarios and finally system may crash due to unclocked register >> accesses. >> >> To fix this kind of issues, in ufshcd_shutdown(), >> >> 1. Use pm_runtime_get_sync() instead of resuming UFS device by >> ufshcd_runtime_resume() "internally" to let runtime PM framework >> manage and prevent concurrent runtime operations by incoming I/O >> requests. >> >> 2. Specifically quiesce all SCSI devices to block all I/O requests >> after device is resumed. >> >> Example of racing scenario: While UFS device is runtime-suspended >> >> Thread #1: Executing UFS shutdown flow, e.g., >> ufshcd_suspend(UFS_SHUTDOWN_PM) >> >> Thread #2: Executing runtime resume flow triggered by I/O request, >> e.g., ufshcd_resume(UFS_RUNTIME_PM) >> >> This breaks the assumption that UFS PM flows can not be running >> concurrently and some unexpected racing behavior may happen. >> >> Signed-off-by: Stanley Chu <stanley.chu@...iatek.com> >> --- >> Changes: >> - Since v6: >> - Do quiesce to all SCSI devices. >> - Since v4: >> - Use pm_runtime_get_sync() instead of resuming UFS device by >> ufshcd_runtime_resume() "internally". >> --- >> drivers/scsi/ufs/ufshcd.c | 27 ++++++++++++++++++++++----- >> 1 file changed, 22 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >> index 307622284239..7cb220b3fde0 100644 >> --- a/drivers/scsi/ufs/ufshcd.c >> +++ b/drivers/scsi/ufs/ufshcd.c >> @@ -8640,6 +8640,7 @@ EXPORT_SYMBOL(ufshcd_runtime_idle); >> int ufshcd_shutdown(struct ufs_hba *hba) >> { >> int ret = 0; >> + struct scsi_target *starget; >> >> if (!hba->is_powered) >> goto out; >> @@ -8647,11 +8648,27 @@ int ufshcd_shutdown(struct ufs_hba *hba) >> if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba)) >> goto out; >> >> - if (pm_runtime_suspended(hba->dev)) { >> - ret = ufshcd_runtime_resume(hba); >> - if (ret) >> - goto out; >> - } >> + /* >> + * Let runtime PM framework manage and prevent concurrent runtime >> + * operations with shutdown flow. >> + */ >> + pm_runtime_get_sync(hba->dev); >> + >> + /* >> + * Quiesce all SCSI devices to prevent any non-PM requests sending >> + * from block layer during and after shutdown. >> + * >> + * Here we can not use blk_cleanup_queue() since PM requests >> + * (with BLK_MQ_REQ_PREEMPT flag) are still required to be sent >> + * through block layer. Therefore SCSI command queued after the >> + * scsi_target_quiesce() call returned will block until >> + * blk_cleanup_queue() is called. >> + * >> + * Besides, scsi_target_"un"quiesce (e.g., scsi_target_resume) can >> + * be ignored since shutdown is one-way flow. >> + */ >> + list_for_each_entry(starget, &hba->host->__targets, siblings) >> + scsi_target_quiesce(starget); >> > > Sorry for misleading you to scsi_target_quiesce(), maybe below is > better. > > shost_for_each_device(sdev, hba->host) > scsi_device_quiesce(sdev); > > We may need to discuss more about this quiesce part since I missed > something. > > After we quiesce the scsi devices, only PM requests are allowed, but it > is still not safe: [1] PM requests can still pass through, [2] there > can > be tasks/reqs present in doorbells before the devices are quiesced. So, > these tasks/reqs in [1] and [2] can still be flying in parallel while > ufshcd_suspend is running. > > How about only quiescing the UFS device well known scsi device but > using > freeze_queue to the other scsi devices? blk_mq_freeze_queue can > eliminate > the risks mentioned in [1] and [2]. > > shost_for_each_device(sdev, hba->host) { > if (sdev == hba->sdev_ufs_device) > scsi_device_quiesce(sdev); > else > blk_mq_freeze_queue(sdev->request_queue); > } > > IF blk_mq_freeze_queue is not allowed to be used by LLD (I think we can > use it as I recalled Bart used to use it in one of his changes to UFS > scaling), > we can use scsi_remove_device instead, it changes scsi device's state > to > SDEV_DEL and calls blk_cleanup_queue. > > We can also make changes like below. [1] is to make sure no more PM > requests > sent to scsi devices, [2] is make sure doorbells are cleared before > invoke > ufshcd_suspend. > > shost_for_each_device(sdev, hba->host) { > scsi_autopm_get_device(sdev); [1] > scsi_device_quiesce(sdev); > } > > ufshcd_wait_for_doorbell_clr(hba, U64_MAX); [2] > > Please let me know which one you prefer or if you have better idea, > thanks! > > Regards, > > Can Guo. > >> ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM); >> out:
Powered by blists - more mailing lists