lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 23 Mar 2011 08:54:37 +0100
From:	Wolfgang Walter <wolfgang.walter@...m.de>
To:	"J. Bruce Fields" <bfields@...ldses.org>
Cc:	Trond Myklebust <Trond.Myklebust@...app.com>,
	linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org
Subject: Re: problem with nfs4: rpciod seems to loop in rpc_shutdown_client forever

Am Dienstag, 22. März 2011 schrieb J. Bruce Fields:
> On Tue, Mar 22, 2011 at 03:52:21PM +0100, Wolfgang Walter wrote:
> > Am Dienstag, 22. März 2011 schrieb J. Bruce Fields:
> > > On Fri, Mar 18, 2011 at 11:49:21PM +0100, Wolfgang Walter wrote:
> > > > Hello,
> > > >
> > > > I have a problem with our nfs-server (stable 2.6.32.33 but also with
> > > > .31 or .32 and probably older ones): sometimes
> > > > one or more rpciod get stuck. I used
> > > >
> > > > 	rpcdebug -m rpc -s all
> > > >
> > > > I get messages as the following one about every second:
> > > >
> > > > Mar 18 11:15:37 au kernel: [44640.906793] RPC:       killing all
> > > > tasks for client ffff88041c51de00 Mar 18 11:15:38 au kernel:
> > > > [44641.906793] RPC:       killing all tasks for client
> > > > ffff88041c51de00 Mar 18 11:15:39 au kernel: [44642.906795] RPC:      
> > > > killing all tasks for client ffff88041c51de00 Mar 18 11:15:40 au
> > > > kernel: [44643.906793] RPC: killing all tasks for client
> > > > ffff88041c51de00 Mar 18 11:15:41 au kernel: [44644.906795] RPC:      
> > > > killing all tasks for client ffff88041c51de00 Mar 18 11:15:42 au
> > > > kernel: [44645.906794] RPC:       killing all tasks for client
> > > > ffff88041c51de00
> > > >
> > > > and I get this messages:
> > > >
> > > > Mar 18 22:45:57 au kernel: [86061.779008]   174 0381     -5
> > > > ffff88041c51de00   (null)        0 ffffffff817211a0 nfs4_cbv1 CB_NULL
> > > > a:rpc_exit_task q:none
> > > >
> > > > My theorie is this one:
> > > >
> > > > * this async task is runnable but does not progress (calling
> > > > rpc_exit_task). * this is because the same rpciod which handles this
> > > > task loops in rpc_shutdown_client waiting for this task to go away. *
> > > > because rpc_shutdown_client is called from an async rpc, too
> > >
> > > Off hand I don't see any place where rpc_shutdown_client() is called
> > > from rpciod; do you?
> >
> > I'm not familiar with the code.
> >
> > But could it be that this is in fs/nfsd/nfs4state.c ?
> >
> > Just a guess because 2.6.38 does not have this problem and in 2.6.38 it
> > seems to have a workqueue of its own.
>
> Well, spotted, yes it's true that 2.6.32 had called put_nfs4_client()
> from an rpc_call_done callback, that put_nfs4_client() can end up
> calling rpc_shutdown_client, and that that's since been fixed....
>
> If someone wants to backport the fix to 2.6.32.y....
>
> Actually I think it might be sufficient just to apply
> 147efd0dd702ce2f1ab44449bd70369405ef68fd ?  But I haven't tried.
>
> --b.
>
> commit 147efd0dd702ce2f1ab44449bd70369405ef68fd
> Author: J. Bruce Fields <bfields@...i.umich.edu>
> Date:   Sun Feb 21 17:41:19 2010 -0800
>
>     nfsd4: shutdown callbacks on expiry
>
>     Once we've expired the client, there's no further purpose to the
>     callbacks; go ahead and shut down the callback client rather than
>     waiting for the last reference to go.
>
>     Signed-off-by: J. Bruce Fields <bfields@...i.umich.edu>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index efef7f2..9ce5831 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -697,9 +697,6 @@ shutdown_callback_client(struct nfs4_client *clp)
>  static inline void
>  free_client(struct nfs4_client *clp)
>  {
> -	shutdown_callback_client(clp);
> -	if (clp->cl_cb_xprt)
> -		svc_xprt_put(clp->cl_cb_xprt);
>  	if (clp->cl_cred.cr_group_info)
>  		put_group_info(clp->cl_cred.cr_group_info);
>  	kfree(clp->cl_principal);
> @@ -752,6 +749,9 @@ expire_client(struct nfs4_client *clp)
>  				 se_perclnt);
>  		release_session(ses);
>  	}
> +	shutdown_callback_client(clp);
> +	if (clp->cl_cb_xprt)
> +		svc_xprt_put(clp->cl_cb_xprt);
>  	put_nfs4_client(clp);
>  }

I'll test it this weekend.

I use 2.6.38 on this server for now and probably will stay with it. But having 
a working longterm-kernel to fall back is important for me :-).  

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ