[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1195413486.7893.16.camel@heimdal.trondhjem.org>
Date: Sun, 18 Nov 2007 14:18:06 -0500
From: Trond Myklebust <trond.myklebust@....uio.no>
To: Torsten Kaiser <just.for.lkml@...glemail.com>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>, linuxppc-dev@...abs.org,
nfs@...ts.sourceforge.net, Andy Whitcroft <apw@...dowen.org>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Jan Blunck <jblunck@...e.de>, steved@...hat.com
Subject: Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
> On Nov 18, 2007 12:05 AM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> > I've been staring at this NFS code for a while an can't make any sense
> > out of it. It seems to correctly initialize the waitqueue. So this would
> > indicate corruption of some sort.
>
> No, it does not "correctly" initialize the waitqueue. It doesn't even
> try to initialize it.
>
> I now found the guilty patch and what is wrong with it.
>
> nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds:
>
> @@ -110,8 +112,22 @@ struct nfs_server {
> filesystem */
> #endif
> void (*destroy)(struct nfs_server *);
> +
> + atomic_t active; /* Keep trace of any activity to this server */
> + wait_queue_head_t active_wq; /* Wait for any activity to stop */
>
> and tries to initialize it:
> @@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server,
> server->namelen = data->namlen;
> /* Create a client RPC handle for the NFSv3 ACL management interface */
> nfs_init_server_aclclient(server);
> +
> + init_waitqueue_head(&server->active_wq);
> + atomic_set(&server->active, 0);
> +
>
> and then uses it via nfs_sb_active and nfs_sb_deactive:
>
> @@ -29,6 +29,7 @@ struct nfs_unlinkdata {
> static void
> nfs_free_unlinkdata(struct nfs_unlinkdata *data)
> {
> + nfs_sb_deactive(NFS_SERVER(data->dir));
> iput(data->dir);
> put_rpccred(data->cred);
> kfree(data->args.name.name);
> @@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry
> *parent, struct inode *dir, struct n
> nfs_dec_sillycount(dir);
> return 0;
> }
> + nfs_sb_active(NFS_SERVER(dir));
> data->args.fh = NFS_FH(dir);
> nfs_fattr_init(&data->res.dir_attr);
>
>
> But it does not notice this:
> struct dentry_operations nfs_dentry_operations = {
> .d_revalidate = nfs_lookup_revalidate,
> .d_delete = nfs_dentry_delete,
> .d_iput = nfs_dentry_iput,
> };
> struct dentry_operations nfs4_dentry_operations = {
> .d_revalidate = nfs_open_revalidate,
> .d_delete = nfs_dentry_delete,
> .d_iput = nfs_dentry_iput,
> };
>
> NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
> unlink and sillyrename logic.
> But they do not share nfs_init_server()!
>
> I wonder why this doesn't blow up more violently, but only hangs...
>
> But as I don't know if it is correct to add the workqueue
> initialization to nfs4_init_server() or remove the nfs_sb_active /
> nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
>
> Torsten
I had already fixed that one in my own stack. Attached are the 3 patches
that I've got. 1 from SteveD, 2 fixes.
Andrew, could you please unapply the sillyrename patches you've got, and
apply these 3 instead?
Trond
Download attachment "linux-2.6.24-005-fix_sillyrename_bug_on_umount.dif" of type "message/rfc822" (4060 bytes)
Download attachment "linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif" of type "message/rfc822" (4205 bytes)
Download attachment "linux-2.6.24-007-fix_nfs_free_unlinkdata.dif" of type "message/rfc822" (1255 bytes)
View attachment "series" of type "text/plain" (232 bytes)
Powered by blists - more mailing lists