lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 13 Aug 2020 19:52:21 -0700
From:   Bart Van Assche <bvanassche@....org>
To:     Stanley Chu <stanley.chu@...iatek.com>
Cc:     "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "avri.altman@....com" <avri.altman@....com>,
        "alim.akhtar@...sung.com" <alim.akhtar@...sung.com>,
        "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        "cang@...eaurora.org" <cang@...eaurora.org>,
        "beanhuo@...ron.com" <beanhuo@...ron.com>,
        "asutoshd@...eaurora.org" <asutoshd@...eaurora.org>,
        "matthias.bgg@...il.com" <matthias.bgg@...il.com>,
        "linux-mediatek@...ts.infradead.org" 
        <linux-mediatek@...ts.infradead.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Kuohong Wang (王國鴻) 
        <kuohong.wang@...iatek.com>,
        Peter Wang (王信友) 
        <peter.wang@...iatek.com>,
        Chun-Hung Wu (巫駿宏) 
        <Chun-hung.Wu@...iatek.com>,
        Andy Teng (鄧如宏) <Andy.Teng@...iatek.com>,
        Chaotian Jing (井朝天) 
        <Chaotian.Jing@...iatek.com>,
        CC Chou (周志杰) <cc.chou@...iatek.com>,
        Jiajie Hao (郝加节) 
        <jiajie.hao@...iatek.com>
Subject: Re: [PATCH v7] scsi: ufs: Quiesce all scsi devices before shutdown

On 2020-08-13 01:55, Stanley Chu wrote:
> I tried many ways to come out the final solution. Currently two options
> are considered,
> 
> == Option 1 ==
> 	pm_runtime_get_sync(hba->dev);
> 
> 	shost_for_each_device(sdev, hba->host) {
> 		scsi_autopm_get_device(sdev);
> 		if (sdev == hba->sdev_ufs_device)
> 			scsi_device_quiesce(sdev);
> 		else
> 			scsi_remove_device(sdev);
> 	}
> 
> 	ret = ufshcd_suspend(hba, UFS_SHUTDOWN_PM);
> 
> 	scsi_remove_device(hba->sdev_ufs_device);
> 
> Note. Using scsi_autopm_get_device() instead of pm_runtime_disable()
> is to prevent noisy message by below checking,
> 
> 	WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
> 
> in
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/scsi/scsi_lib.c#n2515
> 
> This warning shows up if we try to quiesce a runtime-suspended SCSI
> device. This is possible during our new shutdown flow. Using
> scsi_autopm_get_device() to resume all SCSI devices first can prevent
> it.
> 
> In addition, normally sd_shutdown() would be executed prior than
> ufshcd_shutdown(). If scsi_remove_device() is invoked by
> ufshcd_shutdown(), sd_shutdown() will be executed again for a SCSI disk
> by
> 
> [  131.398977]  sd_shutdown+0x44/0x118
> [  131.399416]  sd_remove+0x5c/0xc4
> [  131.399824]  device_release_driver_internal+0x1c4/0x2e4
> [  131.400481]  device_release_driver+0x18/0x24
> [  131.401018]  bus_remove_device+0x108/0x134
> [  131.401533]  device_del+0x2dc/0x630
> [  131.401973]  __scsi_remove_device+0xc0/0x174
> [  131.402510]  scsi_remove_device+0x30/0x48
> [  131.403014]  ufshcd_shutdown+0xc8/0x138
> 
> In this case, we could see SYNCHRONIZE_CACHE command will be sent to the
> same SCSI device twice. This is kind of wired during shutdown flow.
> 
> Moreover, in consideration of performance of ufshcd_shutdown(), Option 1
> obviously degrades the latency a lot by scsi_remove_device(). Please see
> the "Performance Measurement" data below.
> 
> Compared Option 2, this way is simpler and also effective. This way may
> be a better compromise.
> 
> == Option 2  ==
> 	pm_runtime_get_sync(hba->dev);
> 
> 	shost_for_each_device(sdev, hba->host) {
> 		scsi_autopm_get_device(sdev);
> 		scsi_device_quiesce(sdev);
> 	}
> 
> == Performance Measurement ==
> As-Is: < 5 ms
> Option 1: 850 ms
> Option 2: 60 ms
> 
> What would you prefer? Or would you have any further suggestions?

Hi Stanley,

Thanks for the detailed report and also for having shared timing information.

The approach of option 2 seems wrong to me because the SCSI devices are not
removed. My concern is that option (2) could cause the sd driver to send SYNC
and/or STOP commands to the device after its PCIe resources have been freed,
resulting in a crash.

Please take a look at the output of the following command:

$ git grep -nHA10 'struct pci_driver.* = {$' */scsi |
  sed -e 's/-/:/' -e 's/-/:/' |
  grep ':[[:blank:]]*\.remove'

It seems to me that other SCSI LLDs do at least the following in their PCIe
removal callback:

1. Call scsi_remove_host()
2. Call scsi_host_put()
3. Call pci_disable_device()

Would that approach work for UFS? Would offlining the UFS LUNs (SDEV_OFFLINE)
before calling the above functions make SCSI host removal faster? See also
scsi_prep_state_check().

Thanks,

Bart.

Powered by blists - more mailing lists