[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C67C0244-EBA8-4E8C-94D8-E815DC1F979D@oracle.com>
Date: Wed, 10 Sep 2008 16:54:15 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: ebiederm@...ssion.com
Cc: chucklever@...il.com, "Cedric Le Goater" <clg@...ibm.com>,
"Serge E. Hallyn" <serue@...ibm.com>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"Trond Myklebust" <trond.myklebust@....uio.no>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
"Linux Containers" <containers@...ts.osdl.org>,
linux-nfs@...r.kernel.org
Subject: Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared
On Sep 10, 2008, at Sep 10, 2008, 4:02 PM, ebiederm@...ssion.com wrote:
> "Chuck Lever" <chuck.lever@...cle.com> writes:
>> That makes sense.
>>
>> This is likely coming from lockd_down(), and is almost certainly not
>> coming from the same uts namespace as the lockd_up() that did the
>> pmap_set, which was done by the first NFS mount done in the first uts
>> namespace on the system. It's just something that the kernel has to
>> do for maintenance.
>>
>> There is only one lockd() instance that is shared among all the uts
>> namespaces, right? In this case, what is the correct utsname to use?
>
> Interesting.
>
> As a general rule I would say we should capture the uts instance
> in locked_up(). And use the same instance in locked_down().
>
> I'm not at all familiar with how locked interacts with nfs mounts
> in a practical sense. Is there one locked instance (or at least
> context)
> per nfs mount?
>
> The way I would expect things to work is that when we mount an nfs
> filesystem
> from an nfs server. We would create a locked context for that
> server, that
> additional nfs mounts to the same nfs server could share.
There is one lockd, one statd, and one rpcbind per client. These are
shared between all the NFS mounts on the client. Likewise, there is
one of each of these per server, and they are shared among all exports.
lockd_up() and lockd_down() maintain a count of mounts and exports,
and lockd_down() shuts down lockd when the count goes to zero.
statd provides the ability to signal a server when a client reboots
(and vice versa). This gives the server an indication of when to free
locks for any applications on a rebooting client, and gives the client
an indication of when it needs to reclaim locks on a rebooting server.
statd (user space) and lockd (kernel) have to share a cookie
(mon_name) which is used to identify the client to servers, and the
server to clients, so reboots can be detected. That cookie would
probably need to be the initial utsname.
> The way I would expect nfs to interact with the namespaces is for
> the nfs
> mount to capture the uts and network namespaces, and use them for all
> transactions relating to the mount.
That works for the main NFS protocol, perhaps, but the auxiliary
protocols are another matter. They operate on behalf of a whole
client or server, not on behalf of an individual mount or export.
> In particular when creating
> or a locked context the nfs mount would use the uts namespace and the
> network namespace as discriminators to see if an existing locked
> context
> is the same.
Possible, but I would expect this to be a lot of work for not much
gain. The right answer is likely that you need a lockd and statd
instance (virtual or real) for each namespace. The mounts and exports
in each namespace would have their own lockd and statd.
> I don't think nfs has a 1-1 thread to context model which is where
> things
> get really hazy for me.
Users are assigned credentials. The credentials are passed from
client to server, and the server does work on behalf of that
credential (user). lockd uses a credential and a process identifier
to find locks on files.
AUTH_SYS credentials (the lowest common denominator) are constructed
from the user's UID and GID and the client's utsname.
The kernel, then, will have to construct unique credentials for users
in each uts namespace. This is likely not an NFS mount-time issue,
but is instead part of the mechanism of mapping requests from
processes to RPC credentials.
> The conservative play is to always force use of the initial namespace
> and to deny creation of mounts that would use different namespaces.
> In part
> because the initial version of the namespace always exists. Which
> means
> as relates to Cedrics initial patch we would still need to know which
> mounts should cause us to use a different uts namespace so we can deny
> them.
OK. I think what you are saying is that NFS won't work outside of the
initial uts namespace, for now?
Also, how would an automounter fit into this uts namespace scheme?
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists