[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121221230849.GB29739@fieldses.org>
Date: Fri, 21 Dec 2012 18:08:49 -0500
From: "J. Bruce Fields" <bfields@...ldses.org>
To: "Myklebust, Trond" <Trond.Myklebust@...app.com>
Cc: Dave Jones <davej@...hat.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"Adamson, Dros" <Weston.Adamson@...app.com>
Subject: Re: nfsd oops on Linus' current tree.
On Fri, Dec 21, 2012 at 06:40:54PM +0000, Myklebust, Trond wrote:
> On Fri, 2012-12-21 at 13:08 -0500, J. Bruce Fields wrote:
> > On Fri, Dec 21, 2012 at 10:33:48AM -0500, Dave Jones wrote:
> > > Did a mount from a client (also running Linus current), and the
> > > server spat this out..
> > >
> > > [ 6936.306135] ------------[ cut here ]------------
> > > [ 6936.306154] WARNING: at net/sunrpc/clnt.c:617 rpc_shutdown_client+0x12a/0x1b0 [sunrpc]()
> >
> > This is a warning added by 168e4b39d1afb79a7e3ea6c3bb246b4c82c6bdb9
> > "SUNRPC: add WARN_ON_ONCE for potential deadlock", pointing out that
> > nfsd is calling shutdown_client from a workqueue, which is a problem
> > because shutdown_client has to wait on rpc tasks that run on a
> > workqueue.
> >
> > I don't believe there's any circular dependency among the workqueues
> > (we're calling shutdown_client from callback_wq, not rpciod_workqueue),
>
> We were getting deadlocks with rpciod when calling rpc_shutdown_client
> from the nfsiod workqueue.
>
> The problem here is that the workqueues all run using the same pool of
> threads, and so you can get "interesting" deadlocks when one of these
> threads has to wait for another one.
OK, coming back after reviewing Documentation/workqueue.txt: the
workqueues potentially involved here are rpciod and the server's
laundromat and callback workqueues.
The former is created with
alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 1);
and the latter two are both created with
create_singlethread_workqueue(), which also sets WQ_MEM_RECLAIM.
As I understand it, that ensures that each of the three has at least one
"rescuer" thread dedicated to it--so there shouldn't be any deadlock as
long as there's no circular dependency among the three.
So think this is a false positive--am I missing something?
> > but 168e4b39d1afb.. says that we could get a deadlock if both are
> > running on the same kworker thread.
> >
> > I'm not sure what to do about that.
> >
>
> The question is if you really do need the call to rpc_killall_tasks and
> the synchronous wait for completion of old tasks? If you don't care,
> then we could just have you call rpc_release_client() in order to
> release your reference on the rpc_client.
No, the waiting's intentional: for example, the laundromat wq is what's
responsible for reaping expired clients, and it wants to wait for
callback rpc's to exit. (If I remember right they may reference data
structures we're about to clean up).
--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists