lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d7111bd2-d431-e5e1-1a36-6d0d4d4ec19b@quicinc.com>
Date:   Wed, 12 Oct 2022 16:01:01 +0530
From:   Nitin Rawat <quic_nitirawa@...cinc.com>
To:     Peter Wang <peter.wang@...iatek.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>
CC:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] PM-runtime: Check supplier_preactivated before release
 supplier

Hi Peter/Rafael,
We are also observed similiar issue on our platform. Looks like there is 
a race condition(explained below) which cause consumer to resume w/o 
bumping up the supplier's PM-runtime usage counter.

Process 1 (ufshcd_async_scan context)
ufshcd_async_scan()
     scsi_probe_and_add_lun
         scsi_add_lun
             slave_configure    -> enable rpm
                 scsi_sysfs_add_sdev
                     scsi_autopm_get_device
                         device_add     <- invoked sd_probe in process 2
                             scsi_autopm_put_device

Process 2 (sd_probe context)
driver_probe_device
__device_attach_async_helper
     __device_attach_driver
         driver_probe_device
             __driver_probe_device
                 sd_probe
                     scsi_autopm_get_device



Race condition for dev->power.runtime_status for consumer dev 0:0:0:0 
can happen as below in rpm framework

ufshcd_async_scan context (process 1)
scsi_autopm_put_device() //0:0:0:0
	pm_runtime_put_sync()
	__pm_runtime_idle()
	rpm_idle()
	__rpm_callback()
		scsi_runtime_idle()
			pm_runtime_mark_last_busy()
			pm_runtime_autosuspend()
				__pm_runtime_suspend(RPM_AUTO)
				rpm_suspend(RPM_AUTO)
					status = RPM_SUSPENDING
					scsi_runtime_suspend()
						__rpm_callback()
					status = RPM_SUSPENDED------>1
					rpm_suspend_suppliers()
			return -EBUSY

		(use_links)&&(dev->power.runtime_status == RPM_RESUMING && 
retval)------->3
		__rpm_put_suppliers()





sd_probe context (Process 2)
scsi_autopm_get_device() //0:0:0:0
     __pm_runtime_resume(RPM_GET_PUT)
     rpm_resume
      	status = RPM_RESUMING----->2



After power.runtime_status of consumer 0:0:0:0 was changed to 
RPM_SUSPENDED and before scsi_runtime_idle retval was -16(EBUSY) to 
__rpm_callback, power.runtime_status of consumer 0:0:0:0 was changed to 
RPM_RESUMING and hence condition 3 became true and __rpm_put_suppliers 
was called and hence consumer resumed with decremented usage_count due 
to this race condition.

Please let me know your thoughts on this.

Regards,
Nitin

On 8/2/2022 7:03 PM, Peter Wang wrote:
> 
> On 8/2/22 7:01 PM, Rafael J. Wysocki wrote:
>> On Tue, Aug 2, 2022 at 5:19 AM Peter Wang <peter.wang@...iatek.com> 
>> wrote:
>>>
>>>> Hi Rafael,
>>>>
>>>> Yes, it is very clear!
>>>> I miss this important key point that usage_count is always >
>>>> rpm_active 1.
>>>> I think this patch could work.
>>>>
>>>> Thanks.
>>>> Peter
>>>>
>>>>
>>>>
>>>>
>>> Hi Rafael,
>>>
>>> After test with commit ("887371066039011144b4a94af97d9328df6869a2 PM:
>>> runtime: Fix supplier device management during consumer probe") past 
>>> weeks,
>>> The supplier still suspend when consumer is active "after"
>>> pm_runtime_put_suppliers.
>>> Do you have any idea about that?
>> Well, this means that the consumer probe doesn't bump up the
>> supplier's PM-runtime usage counter as appropriate.
>>
>> You need to tell me more about what happens during the consumer probe.
>> Which driver is this?
> 
> Hi Rafael,
> 
> I have the same idea with you. But I still don't know how it could happen.
> 
> It is upstream ufs driver in scsi system. Here is call flow
> do_scan_async (process 1)
>      do_scsi_scan_host
>          scsi_scan_host_selected
>              scsi_scan_channel
>                  __scsi_scan_target
>                      scsi_probe_and_add_lun
>                          scsi_alloc_sdev
>                              slave_alloc     -> setup link
>                          scsi_add_lun
>                              slave_configure    -> enable rpm
>                              scsi_sysfs_add_sdev
>                                  scsi_autopm_get_device    <- get 
> runtime pm
>                                  device_add                <- invoke 
> sd_probe in process 2
>                                  scsi_autopm_put_device    <- put 
> runtime pm, point 1
> 
> driver_probe_device (process 2)
>      __driver_probe_device
>          pm_runtime_get_suppliers
>              really_probe
>                  sd_probe
>                      scsi_autopm_get_device                <- get 
> runtime pm, point 2
>                      pm_runtime_set_autosuspend_delay    <- set rpm 
> delay to 2s
>                      scsi_autopm_put_device                <- put 
> runtime pm
>          pm_runtime_put_suppliers                        <- 
> (link->rpm_active = 1)
> 
> After process 1 call scsi_autopm_put_device(point 1) let consumer enter 
> suspend,
> process 2 call scsi_autopm_get_device(point 2) may have chance resume 
> consumer but not
> bump up the supplier's PM-runtime usage counter as appropriate.
> 
> Thanks.
> Peter
> 
> 
> 
> 
> 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ