lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 1 Mar 2011 23:17:34 -0600
From:	Shaun Ruffell <sruffell@...ffell.net>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Russ Meyerriecks <rmeyerriecks@...ium.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Greg KH <greg@...ah.com>
Subject: Re: [PATCH] mm/dmapool.c: Do not create/destroy sysfs file while
	holding pools_lock

On Tue, Mar 01, 2011 at 05:01:17PM -0800, Andrew Morton wrote:
> On Mon, 28 Feb 2011 16:41:24 -0600
> Russ Meyerriecks <rmeyerriecks@...ium.com> wrote:
> 
> > From: Shaun Ruffell <sruffell@...ium.com>
> > 
> > Eliminates a circular lock dependency reported by lockdep. When reading the
> > "pools" file from a PCI device via sysfs, the s_active lock is acquired before
> > the pools_lock. When unloading the driver and destroying the pool, pools_lock
> > is acquired before the s_active lock.
> > 
> >  cat/12016 is trying to acquire lock:
> >   (pools_lock){+.+.+.}, at: [<c04ef113>] show_pools+0x43/0x140
> > 
> >  but task is already holding lock:
> >   (s_active#82){++++.+}, at: [<c0554e1b>] sysfs_read_file+0xab/0x160
> > 
> >  which lock already depends on the new lock.
> 
> sysfs_dirent_init_lockdep() and the 6992f53349 ("sysfs: Use one lockdep
> class per sysfs attribute") which added it are rather scary.
> 
> The alleged bug appears to be due to taking pools_lock outside
> device_create_file() (which takes magical sysfs PseudoVirtualLocks)
> versus show_pools(), which takes pools_lock but is called from inside
> magical sysfs PseudoVirtualLocks.
> 
> I don't know if this is actually a real bug or not.  Probably not, as
> this device_create_file() does not match the reasons for 6992f53349:
> "There is a sysfs idiom where writing to one sysfs file causes the
> addition or removal of other sysfs files".  But that's a guess.
> 
> > --- a/mm/dmapool.c
> > +++ b/mm/dmapool.c
> > @@ -174,21 +174,28 @@ struct dma_pool *dma_pool_create(const char *name, struct device *dev,
> >  	init_waitqueue_head(&retval->waitq);
> >  
> >  	if (dev) {
> > -		int ret;
> > +		int first_pool;
> >  
> >  		mutex_lock(&pools_lock);
> >  		if (list_empty(&dev->dma_pools))
> > -			ret = device_create_file(dev, &dev_attr_pools);
> > +			first_pool = 1;
> >  		else
> > -			ret = 0;
> > +			first_pool = 0;
> >  		/* note:  not currently insisting "name" be unique */
> > -		if (!ret)
> > -			list_add(&retval->pools, &dev->dma_pools);
> > -		else {
> > -			kfree(retval);
> > -			retval = NULL;
> > -		}
> > +		list_add(&retval->pools, &dev->dma_pools);
> >  		mutex_unlock(&pools_lock);
> > +
> > +		if (first_pool) {
> > +			int ret;
> > +			ret = device_create_file(dev, &dev_attr_pools);
> > +			if (ret) {
> > +				mutex_lock(&pools_lock);
> > +				list_del(&retval->pools);
> > +				mutex_unlock(&pools_lock);
> > +				kfree(retval);
> > +				retval = NULL;
> > +			}
> > +		}
> 
> Not a good fix, IMO.  The problem is that if two CPUs concurrently call
> dma_pool_create(), the first CPU will spend time creating the sysfs
> file.  Meanwhile, the second CPU will whizz straight back to its
> caller.  The caller now thinks that the sysfs file has been created and
> returns to userspace, which immediately tries to read the sysfs file. 
> But the first CPU hasn't finished creating it yet.  Userspace fails.
> 
> One way of fixing this would be to create another singleton lock:
> 
> 
> 	{
> 		static DEFINE_MUTEX(pools_sysfs_lock);
> 		static bool pools_sysfs_done;
> 
> 		mutex_lock(&pools_sysfs_lock);
> 		if (pools_sysfs_done == false) {
> 			create_sysfs_stuff();
> 			pools_sysfs_done = true;
> 		}
> 		mutex_unlock(&pools_sysfs_lock);
> 	}
> 

If I am following, I do not believe using a static pools_sysfs_done flag
will not work since there is one pools file created in sysfs for each
device that creates one or more dma pools. A static flag like that will
fail for any aditional devices.

Assuming that lockdep has uncovered a real bug (I'm not 100% clear on
all the reasons that sysfs PseudoVirtualLocks are needed as opposed
to regular locks) what do you think about something like:

mm/dmapool.c: Do not create/destroy sysfs file while holding pools_lock

Eliminates a circular lock dependency reported by lockdep. When reading the
"pools" file from a PCI device via sysfs, the s_active lock is acquired before
the pools_lock. When unloading the driver and destroying the pool, pools_lock
is acquired before the s_active lock.

 cat/12016 is trying to acquire lock:
  (pools_lock){+.+.+.}, at: [<c04ef113>] show_pools+0x43/0x140

 but task is already holding lock:
  (s_active#82){++++.+}, at: [<c0554e1b>] sysfs_read_file+0xab/0x160

 which lock already depends on the new lock.

This introduces a new pools_sysfs_lock that is used to synchronize
'pools' attribute creation / destruction without requiring 'pools_lock'
to be held.

Signed-off-by: Shaun Ruffell <sruffell@...ium.com>
Signed-off-by: Russ Meyerriecks <rmeyerriecks@...ium.com>
---
 mm/dmapool.c |   37 +++++++++++++++++++++++++++----------
 1 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/mm/dmapool.c b/mm/dmapool.c
index 03bf3bb..b0dd40c 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -64,6 +64,7 @@ struct dma_page {		/* cacheable header for 'allocation' bytes */
 #define	POOL_TIMEOUT_JIFFIES	((100 /* msec */ * HZ) / 1000)
 
 static DEFINE_MUTEX(pools_lock);
+static DEFINE_MUTEX(pools_sysfs_lock);
 
 static ssize_t
 show_pools(struct device *dev, struct device_attribute *attr, char *buf)
@@ -174,21 +175,28 @@ struct dma_pool *dma_pool_create(const char *name, struct device *dev,
 	init_waitqueue_head(&retval->waitq);
 
 	if (dev) {
-		int ret;
+		int first_pool;
 
+		mutex_lock(&pools_sysfs_lock);
 		mutex_lock(&pools_lock);
 		if (list_empty(&dev->dma_pools))
-			ret = device_create_file(dev, &dev_attr_pools);
+			first_pool = 1;
 		else
-			ret = 0;
+			first_pool = 0;
 		/* note:  not currently insisting "name" be unique */
-		if (!ret)
-			list_add(&retval->pools, &dev->dma_pools);
-		else {
-			kfree(retval);
-			retval = NULL;
-		}
+		list_add(&retval->pools, &dev->dma_pools);
 		mutex_unlock(&pools_lock);
+
+		if (first_pool) {
+			if (device_create_file(dev, &dev_attr_pools)) {
+				mutex_lock(&pools_lock);
+				list_del(&retval->pools);
+				mutex_unlock(&pools_lock);
+				kfree(retval);
+				retval = NULL;
+			}
+		}
+		mutex_unlock(&pools_sysfs_lock);
 	} else
 		INIT_LIST_HEAD(&retval->pools);
 
@@ -263,12 +271,21 @@ static void pool_free_page(struct dma_pool *pool, struct dma_page *page)
  */
 void dma_pool_destroy(struct dma_pool *pool)
 {
+	int last_pool;
+
+	mutex_lock(&pools_sysfs_lock);
 	mutex_lock(&pools_lock);
 	list_del(&pool->pools);
 	if (pool->dev && list_empty(&pool->dev->dma_pools))
-		device_remove_file(pool->dev, &dev_attr_pools);
+		last_pool = 1;
+	else
+		last_pool = 0;
 	mutex_unlock(&pools_lock);
 
+	if (last_pool)
+		device_remove_file(pool->dev, &dev_attr_pools);
+	mutex_unlock(&pools_sysfs_lock);
+
 	while (!list_empty(&pool->page_list)) {
 		struct dma_page *page;
 		page = list_entry(pool->page_list.next,
-- 
1.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ