[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1490347486.2766.17.camel@sipsolutions.net>
Date: Fri, 24 Mar 2017 10:24:46 +0100
From: Johannes Berg <johannes@...solutions.net>
To: linux-kernel <linux-kernel@...r.kernel.org>
Cc: Nicolai Stange <nicstange@...il.com>,
"Paul E.McKenney" <paulmck@...ux.vnet.ibm.com>,
gregkh <gregkh@...uxfoundation.org>, sharon.dvir@...el.com,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
linux-wireless <linux-wireless@...r.kernel.org>
Subject: Re: deadlock in synchronize_srcu() in debugfs?
Hi,
On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > Isn't it possible for the following to happen?
> >
> > CPU1 CPU2
> >
> > mutex_lock(&M); // acquires mutex
> > full_proxy_xyz();
> > srcu_read_lock(&debugfs_srcu);
> > real_fops->xyz();
> > mutex_lock(&M); // waiting for mutex
> > debugfs_remove(F);
> > synchronize_srcu(&debugfs_srcu);
> So I'm pretty sure that this can happen. I'm not convinced that it's
> happening here, but still.
I'm a bit confused, in that SRCU, of course, doesn't wait until all the
readers are done - that'd be a regular reader/writer lock or something.
However, it does (have to) wait until all the currently active read-
side sections have terminated, which still leads to a deadlock in the
example above, I think?
In his 2006 LWN article Paul wrote:
The designer of a given subsystem is responsible for: (1) ensuring
that SRCU read-side sleeping is bounded and (2) limiting the amount
of memory waiting for synchronize_srcu(). [1]
In the case of debugfs files acquiring locks, (1) can't really be
guaranteed, especially if those locks can be held while doing
synchronize_srcu() [via debugfs_remove], so I still think the lockdep
annotation needs to be changed to at least have some annotation at
synchronize_srcu() time so we can detect this.
Now, I still suspect there's some other bug here in the case that I'm
seeing, because I don't actually see the "mutex_lock(&M); // waiting"
piece in the traces. I'll need to run this with some tracing on Monday
when the test guys are back from the weekend.
I'm also not sure how I can possibly fix this in debugfs in mac80211
and friends, but that's perhaps a different story. Clearly, this
debugfs patch is a good thing - the code will likely have had use-
after-free problems in this situation without it. But flagging the
potential deadlocks would make it a lot easier to find them.
johannes
Powered by blists - more mailing lists