lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 01 May 2020 09:42:17 +0800 From: Can Guo <cang@...eaurora.org> To: Bart Van Assche <bvanassche@....org> Cc: asutoshd@...eaurora.org, nguyenb@...eaurora.org, hongwus@...eaurora.org, rnayak@...eaurora.org, stanley.chu@...iatek.com, alim.akhtar@...sung.com, beanhuo@...ron.com, Avri.Altman@....com, bjorn.andersson@...aro.org, linux-scsi@...r.kernel.org, kernel-team@...roid.com, saravanak@...gle.com, salyzyn@...gle.com, "James E.J. Bottomley" <jejb@...ux.ibm.com>, "Martin K. Petersen" <martin.petersen@...cle.com>, open list <linux-kernel@...r.kernel.org> Subject: Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume On 2020-05-01 04:32, Bart Van Assche wrote: > On 2020-04-29 22:40, Can Guo wrote: >> On 2020-04-30 13:08, Bart Van Assche wrote: >>> On 2020-04-29 21:10, Can Guo wrote: >>>> During system resume, scsi_resume_device() decreases a request >>>> queue's >>>> pm_only counter if the scsi device was quiesced before. But after >>>> that, >>>> if the scsi device's RPM status is RPM_SUSPENDED, the pm_only >>>> counter is >>>> still held (non-zero). Current scsi resume hook only sets the RPM >>>> status >>>> of the scsi device and its request queue to RPM_ACTIVE, but leaves >>>> the >>>> pm_only counter unchanged. This may make the request queue's pm_only >>>> counter remain non-zero after resume hook returns, hence those who >>>> are >>>> waiting on the mq_freeze_wq would never be woken up. Fix this by >>>> calling >>>> blk_post_runtime_resume() if pm_only is non-zero to balance the >>>> pm_only >>>> counter which is held by the scsi device's RPM ops. >>> >>> How was this issue discovered? How has this patch been tested? >> >> As the issue was found after system resumes, so the issue was >> discovered >> during system suspend/resume test, and it is very easy to be >> replicated. >> After system resumes, if this issue hits some scsi devices, all bios >> sent >> to their request queues are blocked, which may cause a system hang if >> the >> scsi devices are vital to system functionality. >> >> To make sure the patch work well, we have tested system suspend/resume >> and made sure no system hang happen due to request queues got blocked >> by imbalanced pm_only counter. > > Thanks, that's very interesting information. My concern with this patch > is that the power management code is not the only caller of > blk_set_pm_only() / blk_clear_pm_only(). E.g. the SCSI SPI code also > calls scsi_device_quiesce() and scsi_device_resume(). These last > functions call blk_set_pm_only() and blk_clear_pm_only(). More calls of > scsi_device_quiesce() and scsi_device_resume() might be added in the > future. > > Has it been considered to test directly whether a SCSI device has been > runtime suspended instead of relying on blk_queue_pm_only()? How about > using pm_runtime_status_suspended() or adding a function in > block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED? > > Thanks, > > Bart. Hi Bart, Slightly revised my previous mail. Please let me address your concern. First of all, it is allowed to call scsi_device_quiesce() multiple times, but one sdev's request queue's pm_only counter can only be increased once by scsi_device_quiesce(), because if a sdev has already been quiesced, in scsi_device_quiesce(), scsi_device_set_state(sdev, SDEV_QUIESCE) would return -ENIVAL (illegal state transform), then blk_clear_pm_only() shall be called to decrease pm_only once, so no matter how many times scsi_device_quiesce() is called, it can only increase pm_only once. scsi_device_resume() is same, it calls blk_clear_pm_only only once and only if the sdev was quiesced(). So, in a word, after scsi_device_resume() returns in scsi_dev_type_resume(), if a sdev has block layer runtime PM enabled (sdev->request_queue->dev is not NULL), its queue's pm_only counter should be 1 (if the sdev's runtime power status is RPM_SUSPENDED) or 0 (if the sdev's runtime power status is RPM_ACTIVE). If a sdev has block layer runtime PM disabled (sdev->request_queue->dev is NULL), its queue's pm_only counter should be 0. Has it been considered to test directly whether a SCSI device has been runtime suspended instead of relying on blk_queue_pm_only()? How about using pm_runtime_status_suspended() or adding a function in block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED? Yes, I used to make the patch like that way, and it also worked well, as both ways are equal actually. I kinda like the current code because we should be confident that after scsi_dev_type_resume() returns, pm_only must be 0. Different reviewers may have different opinions, either way works well anyways. Thanks, Can Guo.
Powered by blists - more mailing lists