[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100513171114.1629086b@notabene.brown>
Date: Thu, 13 May 2010 17:11:14 +1000
From: Neil Brown <neilb@...e.de>
To: "Aneesh Kumar K. V" <aneesh.kumar@...ux.vnet.ibm.com>
Cc: Andreas Dilger <andreas.dilger@...cle.com>, hch@...radead.org,
viro@...iv.linux.org.uk, adilger@....COM, corbet@....net,
serue@...ibm.com, linux-fsdevel@...r.kernel.org,
sfrench@...ibm.com, philippe.deniel@....FR,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH -V7 3/9] vfs: Add name to file handle conversion support
On Thu, 13 May 2010 11:47:46 +0530
"Aneesh Kumar K. V" <aneesh.kumar@...ux.vnet.ibm.com> wrote:
> On Thu, 13 May 2010 08:43:10 +1000, Neil Brown <neilb@...e.de> wrote:
> > On Wed, 12 May 2010 15:49:49 -0600
> > Andreas Dilger <andreas.dilger@...cle.com> wrote:
> >
> > > On 2010-05-12, at 09:50, Aneesh Kumar K.V wrote:
> > > > +static long do_sys_name_to_handle(struct path *path,
> > > > + struct file_handle __user *ufh)
> > > > +{
> > > > + if (handle_size <= f_handle.handle_size) {
> > > > + /* get the uuid */
> > > > + retval = sb->s_op->get_fsid(sb, &this_fs_id);
> > > > + if (!retval) {
> > > > + /*
> > > > + * Now verify whether we get the same vfsmount
> > > > + * if we lookup with uuid. In case we end up having
> > > > + * same uuid for the multiple file systems. When doing
> > > > + * uuid based lookup we would return the first one.So
> > > > + * with name_to_handle if we don't find the same
> > > > + * vfsmount with lookup return EOPNOTSUPP
> > > > + */
> > > > + mnt = fs_get_vfsmount(current, &this_fs_id);
> > > > + if (mnt != path->mnt) {
> > > > + retval = -EOPNOTSUPP;
> > > > + mntput(mnt);
> > > > + goto err_free_out;
> > > > + }
> > >
> > > I don't see that this does anything for us except add overhead. This is no protection against mounting a second filesystem with the same UUID after the handle is returned, since there is no expiration for file handles.
> >
> > I very much agree.
> > Further I have not yet seen a convincing argument that the UUID should be
> > part of the kernel interface for filehandle lookup. That fact that including
> > the UUID suggests to you (Aneesh) that this sort of thing might be a good
> > idea seems to me to be a strong argument against including the UUID in the
> > lookup.
>
>
> The other option I tried was to use mountdir fd to point to the
> vfsmount. The problem with such an approach is that NFS server will now
> have to maintain a cache for file system handle to mountdir mapping.
> (xfsprogs libhandle/handle.c shows what such a cache would look like)
>
> The server would anyhow require to get a system wide unique handle to represent the
> file. ie server will have to generate unique id to represent the
> filesystem and then maintain the unique id to mount dir mapping in the
> user space. Also we have the problem of rebuiling the cache after a
> server restart (due to crash). With the above observation it guess it
> make it easier to have kernel provide file system id which can be
> directly used by the userspace. Hence the descission to use
> UUID. Whether UUID is the right value to represent file system is
> something we should decide.
Here is my reason (well, one of my reasons) why UUID is clearly not an
acceptable handle to use - you *must* use an fd or are worst a path name.
A filesystem can be mounted at multiple places in the namespace and each
instance can have other filesystems mounted on different directories.
Each of these is a 'vfsmnt'.
You need a vfsmnt for most (all?) filesystem operations.
A UUID does not uniquely identify a vfsmnt. It might uniquely identify a
filesystem, though that is debatable. It definitely does not uniquely
identify a vfsmnt.
Therefore, given only a UUID the kernel would have to arbitrarily choose a
vfsmnt that references to the right filesystem. If a particular directory in
the filesystem that you want to access was mounted-on in that vfsmnt, that
directory would be completely inaccessible to you, even though it might be
completely accessible in some other vfsmnt.
So it is quite possible for a scheme based on kernel interpretation of uuids
to fail.
This may be a corner case, but I think people are slowly getting more
adventurous in terms of using the mount table to do interesting things. It
may be less of a corner case in 5 years.
So to tell the kernel which filesystem is of interest for filehandle
lookup, you *must* give it a name, whether a textual path, or a filehandle
obtained by opening a textual path.
knfsd creates a cache in the kernel by telling it that a particular
handle-fragment maps to a particular path. The handle-fragment is normally
a uuid by default, but it is trivial to choose something else via
configuration in userspace if a corner-case needs to be handled.
Creating such a cache and rebuilding after server-restart is not really very
difficult.
NeilBrown
>
> >
> > >
> > > At best I think we could start by changing the list-based UUID lookup with a hash-based one, and when adding a duplicate UUID at mount time start by printing out an error message to the console in case of duplicated UUIDs, and maybe at some point in the future this might cause the mount to fail (though I don't think we can make that decision lightly or quickly).
> > >
> > > That moves the overhead to mount time instead of for each name_to_handle() call (which would be brutal for a system with many filesystems mounted).
> > >
>
> If we implement the above, can we say UUID can be used to identify the
> file system ?
>
>
> > > > +SYSCALL_DEFINE4(name_to_handle_at, int, dfd, const char __user *, name,
> > > > + struct file_handle __user *, handle, int, flag)
> > > > +{
> > > > + int follow;
> > > > + long ret = -EINVAL;
> > > > + struct path path;
> > > > +
> > > > + if ((flag & ~AT_SYMLINK_NOFOLLOW) != 0)
> > > > + goto err_out;
> > >
> > > Won't this break the use of AT_FDCWD to get a relative file handle?
> >
> > No, AT_FDCWD is a value passed in 'dfd', not a flag.
> >
> > I wonder though if the default should be to not follow symlinks, and that
> > AT_SYMLINK_FOLLOW should be required if you really want to follow symlinks.
> > I suspect that in most cases you wouldn't want to follow symlinks.
> > It isn't a really important point though.
> >
>
> Will fix that in next iteration.
>
> -aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists