[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190510050946.GA17994@bridge>
Date: Fri, 10 May 2019 13:09:46 +0800
From: Wenbin Zeng <wenbin.zeng@...il.com>
To: "J. Bruce Fields" <bfields@...ldses.org>
Cc: viro@...iv.linux.org.uk, davem@...emloft.net, jlayton@...nel.org,
trond.myklebust@...merspace.com, anna.schumaker@...app.com,
wenbinzeng@...cent.com, dsahern@...il.com,
nicolas.dichtel@...nd.com, willy@...radead.org,
edumazet@...gle.com, jakub.kicinski@...ronome.com,
tyhicks@...onical.com, chuck.lever@...cle.com, neilb@...e.com,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, linux-nfs@...r.kernel.org
Subject: Re: [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1
On Thu, May 09, 2019 at 04:52:18PM -0400, J. Bruce Fields wrote:
> Thanks for figuring this out!
>
> I guess I'll take these patches (with the one fix in your response to
> Al) through the nfsd tree, unless someone tells me otherwise. (The
> original bug was introduced through nfsd.)
Thank you, Bruce.
I am submitting v2 with that fix right away.
>
> How serious are the consequences of the leak? I'm wondering if it's
> worth a stable cc or not.
Though the leak only happens with _privileged_ docker containers that have
gssproxy service enabled and use-gss-proxy set to 1, the consequences
can be ugly, the killed/stopped containers not only leave struct net
unfreed, also possibly leave behind veth devices linked to the netns, in
environments that containers are frequently killed/stopped, it is quite
ugly.
>
> --b.
>
> On Wed, May 01, 2019 at 02:42:22PM +0800, Wenbin Zeng wrote:
> > This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.
> >
> > The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.
> >
> > It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
> > net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
> > Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.
> >
> > To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.
> >
> > Wenbin Zeng (3):
> > nsfs: add evict callback into struct proc_ns_operations
> > netns: add netns_evict into netns_operations
> > auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
> > use-gss-proxy==1
> >
> > fs/nsfs.c | 2 ++
> > include/linux/proc_ns.h | 1 +
> > include/net/net_namespace.h | 1 +
> > net/core/net_namespace.c | 12 ++++++++++++
> > net/sunrpc/auth_gss/auth_gss.c | 9 ++++++---
> > 5 files changed, 22 insertions(+), 3 deletions(-)
> >
> > --
> > 1.8.3.1
Powered by blists - more mailing lists