linux-kernel - Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <521C0907.6000500@cn.fujitsu.com>
Date:	Tue, 27 Aug 2013 10:03:51 +0800
From:	Gu Zheng <guz.fnst@...fujitsu.com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
CC:	ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Toshi Kani <toshi.kani@...com>,
	LKML <linux-kernel@...r.kernel.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
Subject: Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

Hi Rafael,

On 08/26/2013 10:43 PM, Rafael J. Wysocki wrote:

> On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
>> On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
>>> Hi Rafael,
>>
>> Hi,
>>
>>> On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:
>>>
>>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>>
>>>> There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
>>>> around the acpi_bus_trim() call in acpi_scan_hot_remove() which
>>>> generally removes devices (it removes ACPI device objects at least,
>>>> but it may also remove "physical" device objects through .detach()
>>>> callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
>>>> attributes are removed under these locks and to remove those
>>>> attributes it is necessary to hold the s_active references of their
>>>> directory entries for writing.
>>>>
>>>> On the other hand, the execution of a .show() or .store() callback
>>>> from a sysfs attribute is carried out with that attribute's s_active
>>>> reference held for reading.  Consequently, if any device sysfs
>>>> attribute that may be removed from within acpi_scan_hot_remove()
>>>> through acpi_bus_trim() has a .store() or .show() callback which
>>>> acquires either acpi_scan_lock or device_hotplug_lock, the execution
>>>> of that callback may deadlock with the removal of the attribute.
>>>> [Unfortunately, the "online" device attribute of CPUs and memory
>>>> blocks and the "eject" attribute of ACPI device objects are affected
>>>> by this issue.]
>>>>
>>>> To avoid those deadlocks introduce a new protection mechanism that
>>>> can be used by the device sysfs attributes in question.  Namely,
>>>> if a device sysfs attribute's .store() or .show() callback routine
>>>> is about to acquire device_hotplug_lock or acpi_scan_lock, it can
>>>> first execute read_lock_device_remove() and return an error code if
>>>> that function returns false.  If true is returned, the lock in
>>>> question may be acquired and read_unlock_device_remove() must be
>>>> called.  [This mechanism is implemented by means of an additional
>>>> rwsem in drivers/base/core.c.]
>>>>
>>>> Make the affected sysfs attributes in the driver core and ACPI core
>>>> use read_lock_device_remove() and read_unlock_device_remove() as
>>>> described above.
>>>>
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>> Reported-by: Gu Zheng <guz.fnst@...fujitsu.com>
>>>
>>> I'm sorry to forget to mention that the original reporter is
>>> Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>. I continued
>>> the investigation and found more issues.
>>>
>>> We tested this patch on kernel 3.11-rc6, but it seems that the
>>> issue is still there. Detail info as following.
>>
>> Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
>> because we'll need to take acpi_scan_lock during system suspend for PCI hot
>> remove to work and that's under pm_mutex.  So I wonder if we can simply
>> drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
>> a side note, because dropping it won't help here.
>>
>> Now ->
>>
>>> ======================================================                                                                                                                           
>>> [ INFO: possible circular locking dependency detected ]                                                                                                                          
>>> 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF                                                                                                                                     
>>> -------------------------------------------------------                                                                                                                          
>>> kworker/0:2/754 is trying to acquire lock:                                                                                                                                       
>>>  (s_active#73){++++.+}, at: [<ffffffff8121062b>] sysfs_addrm_finish+0x3b/0x70                                                                                                    
>>>                                                                                                                                                                                  
>>> but task is already holding lock:                                                                                                                                                
>>>  (mem_sysfs_mutex){+.+.+.}, at: [<ffffffff813b949d>] remove_memory_block+0x1d/0xa0                                                                                               
>>>                                                                                                                                                                                  
>>> which lock already depends on the new lock.                                                                                                                                      
>>>                                                                                                                                                                                  
>>>                                                                                                                                                                                  
>>> the existing dependency chain (in reverse order) is:                                                                                                                             
>>>                                                                                                                                                                                  
>>> -> #4 (mem_sysfs_mutex){+.+.+.}:                                                                                                                                                 
>>>        [<ffffffff810ba88c>] validate_chain+0x70c/0x870                                                                                                                           
>>>        [<ffffffff810bad5f>] __lock_acquire+0x36f/0x5f0                                                                                                                           
>>>        [<ffffffff810bb080>] lock_acquire+0xa0/0x130                                                                                                                              
>>>        [<ffffffff8159781b>] mutex_lock_nested+0x7b/0x3b0                                                                                                                         
>>>        [<ffffffff813b9361>] add_memory_section+0x51/0x150                                                                                                                        
>>>        [<ffffffff813b947b>] register_new_memory+0x1b/0x20                                                                                                                        
>>>        [<ffffffff81590d21>] __add_pages+0x111/0x120                                                                                                                              
>>>        [<ffffffff81041224>] arch_add_memory+0x64/0xf0                                                                                                                            
>>>        [<ffffffff81590e07>] add_memory+0xd7/0x1e0                                                                                                                                
>>>        [<ffffffff81347db4>] acpi_memory_enable_device+0x77/0x12b                                                                                                                 
>>>        [<ffffffff8134801e>] acpi_memory_device_add+0x18f/0x198                                                                                                                   
>>>        [<ffffffff8131aa5a>] acpi_bus_device_attach+0x9a/0x100                                                                                                                    
>>>        [<ffffffff8133666d>] acpi_ns_walk_namespace+0xbe/0x17d                                                                                                                    
>>>        [<ffffffff81336b03>] acpi_walk_namespace+0x8a/0xc4                                                                                                                        
>>>        [<ffffffff8131c126>] acpi_bus_scan+0x91/0x9c                                                                                                                              
>>>        [<ffffffff8131c1af>] acpi_scan_bus_device_check+0x7e/0x10f                                                                                                                
>>>        [<ffffffff8131c253>] acpi_scan_device_check+0x13/0x15                                                                                                                     
>>>        [<ffffffff81315dac>] acpi_os_execute_deferred+0x27/0x34                                                                                                                   
>>>        [<ffffffff8106bec8>] process_one_work+0x1e8/0x560                                                                                                                         
>>>        [<ffffffff8106d0a0>] worker_thread+0x120/0x3a0                                                                                                                            
>>>        [<ffffffff81073b5e>] kthread+0xee/0x100                                                                                                                                   
>>>        [<ffffffff815a605c>] ret_from_fork+0x7c/0xb0                                                                                                                              
>>>                                                                                                                                                                                  
>>> -> #3 (pm_mutex){+.+.+.}:                                                                                                                                                        
>>>        [<ffffffff810ba88c>] validate_chain+0x70c/0x870                                                                                                                           
>>>        [<ffffffff810bad5f>] __lock_acquire+0x36f/0x5f0                                                                                                                           
>>>        [<ffffffff810bb080>] lock_acquire+0xa0/0x130                                                                                                                              
>>>        [<ffffffff8159781b>] mutex_lock_nested+0x7b/0x3b0                                                                                                                         
>>>        [<ffffffff81188795>] lock_memory_hotplug+0x35/0x40                                                                                                                        
>>>        [<ffffffff81590d5f>] add_memory+0x2f/0x1e0                                                                                                                                
>>>        [<ffffffff81347db4>] acpi_memory_enable_device+0x77/0x12b                                                                                                                 
>>>        [<ffffffff8134801e>] acpi_memory_device_add+0x18f/0x198                                                                                                                   
>>>        [<ffffffff8131aa5a>] acpi_bus_device_attach+0x9a/0x100                                                                                                                    
>>>        [<ffffffff8133666d>] acpi_ns_walk_namespace+0xbe/0x17d                                                                                                                    
>>>        [<ffffffff81336b03>] acpi_walk_namespace+0x8a/0xc4                                                                                                                        
>>>        [<ffffffff8131c126>] acpi_bus_scan+0x91/0x9c                                                                                                                              
>>>        [<ffffffff81d39862>] acpi_scan_init+0x66/0x15a                                                                                                                            
>>>        [<ffffffff81d397a0>] acpi_init+0xa1/0xbb                                                                                                                                  
>>>        [<ffffffff810002c2>] do_one_initcall+0xf2/0x1a0                                                                                                                           
>>>        [<ffffffff81d018da>] do_basic_setup+0x9d/0xbb                                                                                                                             
>>>        [<ffffffff81d01b02>] kernel_init_freeable+0x20a/0x28a                                                                                                                     
>>>        [<ffffffff8158f20e>] kernel_init+0xe/0xf0                                                                                                                                 
>>>        [<ffffffff815a605c>] ret_from_fork+0x7c/0xb0                                                                                                                              
>>>                                                                                                                                                                                  
>>> -> #2 (mem_hotplug_mutex){+.+.+.}:                                                                                                                                               
>>>        [<ffffffff810ba88c>] validate_chain+0x70c/0x870                                                                                                                           
>>>        [<ffffffff810bad5f>] __lock_acquire+0x36f/0x5f0                                                                                                                           
>>>        [<ffffffff810bb080>] lock_acquire+0xa0/0x130                                                                                                                              
>>>        [<ffffffff8159781b>] mutex_lock_nested+0x7b/0x3b0                                                                                                                         
>>>        [<ffffffff81188777>] lock_memory_hotplug+0x17/0x40                                                                                                                        
>>>        [<ffffffff81590d5f>] add_memory+0x2f/0x1e0                                                                                                                                
>>>        [<ffffffff81347db4>] acpi_memory_enable_device+0x77/0x12b                                                                                                                 
>>>        [<ffffffff8134801e>] acpi_memory_device_add+0x18f/0x198                                                                                                                   
>>>        [<ffffffff8131aa5a>] acpi_bus_device_attach+0x9a/0x100                                                                                                                    
>>>        [<ffffffff8133666d>] acpi_ns_walk_namespace+0xbe/0x17d                                                                                                                    
>>>        [<ffffffff81336b03>] acpi_walk_namespace+0x8a/0xc4                                                                                                                        
>>>        [<ffffffff8131c126>] acpi_bus_scan+0x91/0x9c                                                                                                                              
>>>        [<ffffffff8131c1af>] acpi_scan_bus_device_check+0x7e/0x10f                                                                                                                
>>>        [<ffffffff8131c253>] acpi_scan_device_check+0x13/0x15                                                                                                                     
>>>        [<ffffffff81315dac>] acpi_os_execute_deferred+0x27/0x34                                                                                                                   
>>>        [<ffffffff8106bec8>] process_one_work+0x1e8/0x560                                                                                                                         
>>>        [<ffffffff8106d0a0>] worker_thread+0x120/0x3a0                                                                                                                            
>>>        [<ffffffff81073b5e>] kthread+0xee/0x100                                                                                                                                   
>>>        [<ffffffff815a605c>] ret_from_fork+0x7c/0xb0                                                                                                                              
>>>                                                                                                                                                                                  
>>> -> #1 (device_hotplug_lock){+.+.+.}:                                                                                                                                             
>>>        [<ffffffff810ba88c>] validate_chain+0x70c/0x870                                                                                                                           
>>>        [<ffffffff810bad5f>] __lock_acquire+0x36f/0x5f0                                                                                                                           
>>>        [<ffffffff810bb080>] lock_acquire+0xa0/0x130                                                                                                                              
>>>        [<ffffffff8159781b>] mutex_lock_nested+0x7b/0x3b0                                                                                                                         
>>>        [<ffffffff813a16e7>] lock_device_hotplug+0x17/0x20
>>
>> -> we should have grabbed device_remove_rwsem for reading here with the patch
>> applied, which means that -->
>>
>>>        [<ffffffff813a2553>] show_online+0x33/0x80                                                                                                                                
>>>        [<ffffffff813a1ce7>] dev_attr_show+0x27/0x50                                                                                                                              
>>>        [<ffffffff8120ee94>] fill_read_buffer+0x74/0x100                                                                                                                          
>>>        [<ffffffff8120efcc>] sysfs_read_file+0xac/0xe0                                                                                                                            
>>>        [<ffffffff81195d21>] vfs_read+0xb1/0x130                                                                                                                                  
>>>        [<ffffffff81196232>] SyS_read+0x62/0xb0                                                                                                                                   
>>>        [<ffffffff815a6102>] system_call_fastpath+0x16/0x1b                                                                                                                       
>>>                                                                                                                                                                                  
>>> -> #0 (s_active#73){++++.+}:                                                                                                                                                     
>>>        [<ffffffff810ba148>] check_prev_add+0x598/0x5d0                                                                                                                           
>>>        [<ffffffff810ba88c>] validate_chain+0x70c/0x870                                                                                                                           
>>>        [<ffffffff810bad5f>] __lock_acquire+0x36f/0x5f0                                                                                                                           
>>>        [<ffffffff810bb080>] lock_acquire+0xa0/0x130                                                                                                                              
>>>        [<ffffffff8120fed7>] sysfs_deactivate+0x157/0x1c0                                                                                                                         
>>>        [<ffffffff8121062b>] sysfs_addrm_finish+0x3b/0x70                                                                                                                         
>>>        [<ffffffff8120e280>] sysfs_hash_and_remove+0x60/0xb0                                                                                                                      
>>>        [<ffffffff8120f2a9>] sysfs_remove_file+0x39/0x50                                                                                                                          
>>>        [<ffffffff813a2977>] device_remove_file+0x17/0x20                                                                                                                         
>>>        [<ffffffff813a29aa>] device_remove_attrs+0x2a/0xe0                                                                                                                        
>>>        [<ffffffff813a2b8b>] device_del+0x12b/0x1f0                                                                                                                               
>>>        [<ffffffff813a2c72>] device_unregister+0x22/0x60                                                                                                                          
>>>        [<ffffffff813b94ed>] remove_memory_block+0x6d/0xa0                                                                                                                        
>>>        [<ffffffff813b953f>] unregister_memory_section+0x1f/0x30                                                                                                                  
>>>        [<ffffffff81189468>] __remove_pages+0xc8/0x150                                                                                                                            
>>>        [<ffffffff8158f994>] arch_remove_memory+0x94/0xd0                                                                                                                         
>>>        [<ffffffff81590bef>] remove_memory+0x6f/0x90                                                                                                                              
>>>        [<ffffffff81347ccc>] acpi_memory_remove_memory+0x7e/0xa3                                                                                                                  
>>>        [<ffffffff81347d18>] acpi_memory_device_remove+0x27/0x33                                                                                                                  
>>>        [<ffffffff8131a8a2>] acpi_bus_device_detach+0x3d/0x5e                                                                                                                     
>>>        [<ffffffff81336685>] acpi_ns_walk_namespace+0xd6/0x17d                                                                                                                    
>>>        [<ffffffff81336b03>] acpi_walk_namespace+0x8a/0xc4                                                                                                                        
>>>        [<ffffffff8131a8f6>] acpi_bus_trim+0x33/0x7a                                                                                                                              
>>>        [<ffffffff8131b4c0>] acpi_scan_hot_remove+0x160/0x281                                                                                                                     
>>
>> --> device_hotplug_lock is already held by show_online() at this point (or if
>> it is not held, that function will return -EBUSY after attempting to grab
>> device_remove_rwsem for reading), so acpi_scan_hot_remove() will wait for it
>> to be released before calling acpi_bus_trim().
>>
>> So the situation in which one thread holds s_active for reading and blocks on
>> device_hotplug_lock while another thread is holding it over device removal is
>> clearly impossible to me.
>>
>> So I'm not sure how device_hotplug_lock is still involved?
> 
> Well, it is not involved, but lockdep doesn't see that anyway.  Bummer.
> 
>>>        [<ffffffff8131b6fc>] acpi_bus_hot_remove_device+0x37/0x73                                                                                                                 
>>>        [<ffffffff81315dac>] acpi_os_execute_deferred+0x27/0x34                                                                                                                   
>>>        [<ffffffff8106bec8>] process_one_work+0x1e8/0x560                                                                                                                         
>>>        [<ffffffff8106d0a0>] worker_thread+0x120/0x3a0                                                                                                                            
>>>        [<ffffffff81073b5e>] kthread+0xee/0x100                                                                                                                                   
>>>        [<ffffffff815a605c>] ret_from_fork+0x7c/0xb0                                                                                                                              
>>>                                                                                                                                                                                  
>>> other info that might help us debug this:                                                                                                                                        
>>>                                                                                                                                                                                  
>>> Chain exists of:                                                                                                                                                                 
>>>   s_active#73 --> pm_mutex --> mem_sysfs_mutex                                                                                                                                   
>>>                                                                                                                                                                                  
>>>  Possible unsafe locking scenario:                                                                                                                                               
>>>                                                                                                                                                                                  
>>>        CPU0                    CPU1                                                                                                                                              
>>>        ----                    ----                                                                                                                                              
>>>   lock(mem_sysfs_mutex);                                                                                                                                                         
>>>                                lock(pm_mutex);                                                                                                                                   
>>>                                lock(mem_sysfs_mutex);                                                                                                                            
>>>   lock(s_active#73);                                                                                                                                                             
>>>                                                                                                                                                                                  
>>>  *** DEADLOCK ***                                                                                                                                                                
> 
> OK, so the patch below is quick and dirty and overkill, but it should make the
> splat go away at least.

Yeah, this patch is the same as the one I attached in previous discussion threads.
It does make the splat go away, but as you said, it's a bit overkill.

Regards,
Gu

> 
> Thanks,
> Rafael
> 
> 
> ---
>  drivers/acpi/scan.c |    3 ++-
>  drivers/base/core.c |   36 ++++++++++++++++++++----------------
>  2 files changed, 22 insertions(+), 17 deletions(-)
> 
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -49,6 +49,18 @@ static struct kobject *dev_kobj;
>  struct kobject *sysfs_dev_char_kobj;
>  struct kobject *sysfs_dev_block_kobj;
>  
> +static DEFINE_MUTEX(device_hotplug_lock);
> +
> +void lock_device_hotplug(void)
> +{
> +	mutex_lock(&device_hotplug_lock);
> +}
> +
> +void unlock_device_hotplug(void)
> +{
> +	mutex_unlock(&device_hotplug_lock);
> +}
> +
>  #ifdef CONFIG_BLOCK
>  static inline int device_is_not_partition(struct device *dev)
>  {
> @@ -408,9 +420,11 @@ static ssize_t show_online(struct device
>  {
>  	bool val;
>  
> -	lock_device_hotplug();
> +	if (!mutex_trylock(&device_hotplug_lock))
> +		return -EAGAIN;
> +
>  	val = !dev->offline;
> -	unlock_device_hotplug();
> +	mutex_unlock(&device_hotplug_lock);
>  	return sprintf(buf, "%u\n", val);
>  }
>  
> @@ -424,9 +438,11 @@ static ssize_t store_online(struct devic
>  	if (ret < 0)
>  		return ret;
>  
> -	lock_device_hotplug();
> +	if (!mutex_trylock(&device_hotplug_lock))
> +		return -EAGAIN;
> +
>  	ret = val ? device_online(dev) : device_offline(dev);
> -	unlock_device_hotplug();
> +	mutex_unlock(&device_hotplug_lock);
>  	return ret < 0 ? ret : count;
>  }
>  
> @@ -1479,18 +1495,6 @@ EXPORT_SYMBOL_GPL(put_device);
>  EXPORT_SYMBOL_GPL(device_create_file);
>  EXPORT_SYMBOL_GPL(device_remove_file);
>  
> -static DEFINE_MUTEX(device_hotplug_lock);
> -
> -void lock_device_hotplug(void)
> -{
> -	mutex_lock(&device_hotplug_lock);
> -}
> -
> -void unlock_device_hotplug(void)
> -{
> -	mutex_unlock(&device_hotplug_lock);
> -}
> -
>  static int device_check_offline(struct device *dev, void *not_used)
>  {
>  	int ret;
> Index: linux-pm/drivers/acpi/scan.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -510,7 +510,8 @@ acpi_eject_store(struct device *d, struc
>  	if (ACPI_FAILURE(status) || !acpi_device->flags.ejectable)
>  		return -ENODEV;
>  
> -	mutex_lock(&acpi_scan_lock);
> +	if (!mutex_trylock(&acpi_scan_lock))
> +		return -EAGAIN;
>  
>  	if (acpi_device->flags.eject_pending) {
>  		/* ACPI eject notification event. */
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/