[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <51D3ECAD-F5AA-4090-91EE-0B3A2C67F335@oracle.com>
Date: Fri, 2 Jul 2010 16:47:56 -0600
From: Andreas Dilger <andreas.dilger@...cle.com>
To: Neil Brown <neilb@...e.de>
Cc: hch@...radead.org,
"Aneesh Kumar K. V" <aneesh.kumar@...ux.vnet.ibm.com>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"adilger@....com" <adilger@....com>,
"corbet@....net" <corbet@....net>,
"serue@...ibm.com" <serue@...ibm.com>,
"hooanon05@...oo.co.jp" <hooanon05@...oo.co.jp>,
"bfields@...ldses.org" <bfields@...ldses.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"sfrench@...ibm.com" <sfrench@...ibm.com>,
"philippe.deniel@....FR" <philippe.deniel@....FR>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -V14 0/11] Generic name to handle and open by handle syscalls
On 2010-07-02, at 16:09, Neil Brown wrote:
> On Fri, 2 Jul 2010 10:12:47 -0600
> Andreas Dilger <andreas.dilger@...cle.com> wrote:
>>
>> I haven't looked at this part of the VFS in a while, but it looks like reconnect_path() is an implementation issue specific to knfsd, and shouldn't be needed for regular files. i.e. if exportfs_encode_fh() is never used on a disconnected file, then this overhead is not incurred.
>>
>> The above use of open_by_handle() is not for userspace NFS/Samba re-export, but to allow applications to open regular files for IO.
>
> Firstly it is needed for directories so that the VFS can effectively lock
> against directory rename races which could otherwise create disconnected
> subtrees (where the first parent is a member only of one of its
> descendants). So if you get a filehandle for a directory it *must* be
> properly connected to the root for rename to be safe. This operation is
> faster than a full path lookup if the dentry is already is cache, and slower
> if it and any of the path is not in cache.
OK, so this requirement is specific for directories, and not at all needed for regular files.
> Secondly it is needed if you want to enforce the rule that the contents of a
> directory are only accessible if the 'x' bit on the directory is set.
> kNFSd does not enforce this (unless subtree_check is specified), partly
> because it is hard to do correctly and partly because we have to trust the
> client any, so trusting it to check the 'x' bit is very little extra trust.
If the application that called name_to_handle() already had to traverse the whole pathname to get the file handle, then there shouldn't necessarily be a requirement to do this when calling open_by_handle(). The only possible permission checking in open_by_handle() is the permission on the inode itself.
> Note that it is not possible to reliably perform filehandle lookup for
> non-directories if you need a fully reconnected dentry, as
> cross-directory-renames can confuse the situation beyond recovery.
For normal file IO, a fully connected dentry is not needed, and in fact the handle_to_path->exportfs_decode_fh() code will accept any inode alias for reguar file use.
> Maybe open-by-handle should require DAC_OVERRIDE, or maybe a new
> DAC_X_OVERRIDE. And if those aren't provided it only works for directories.
That's the big question. If the file handle has some "non-public" information in it (i.e. a capability that cannot be (easily) guessed or forged), then there should not be any need for DAC_OVERRIDE. This could easily be enforced if there was a provision for "short term" file handles that only had to live a few minutes or less, so the kernel could just store a random cookie in each file handle and require applications to get a new handle if the cookie expires or the server crashes.
However, even a "plain" file handle containing only the inode/generation is relatively secure in this respect, since the only way to get the inode number of a particular file is "ls -li" (which either assumes path "x" traversal permission, OR guessing the inode number), and ioctl(FS_IOC_GETVERSION) which requires being able to open the inode already.
Guessing the inode number by itself is fairly weak, at most 2^32 inodes in most filesystems, usually far fewer. Guessing the generation number is much harder (though not impossible).
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists