[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <25967.1163604494@redhat.com>
Date: Wed, 15 Nov 2006 15:28:14 +0000
From: David Howells <dhowells@...hat.com>
To: Trond Myklebust <trond.myklebust@....uio.no>
Cc: David Howells <dhowells@...hat.com>,
Stephen Smalley <sds@...ho.nsa.gov>,
Karl MacMillan <kmacmill@...hat.com>, jmorris@...ei.org,
chrisw@...s-sol.org, selinux@...ho.nsa.gov,
linux-kernel@...r.kernel.org, aviro@...hat.com
Subject: Re: Security issues with local filesystem caching
Trond Myklebust <trond.myklebust@....uio.no> wrote:
> No. All you should need is the result of the lookup().
Plus the mkdirs, etc., but I'll assume you implied those.
> The coherency data needs to be checked against an eventual existing
> CacheFiles entry during the call to read(), not before.
Or mmap().
But, yes, whilst that's true; the read() is almost certainly going to occur
very quickly after the open(), and so the likelyhood is that asynchronicity is
not going to matter.
> Umm...because the former is a model which actually fits your security
> requirements (i.e. one privileged daemon gets lookup()+open()+mkdir()...
> rights on the CacheFiles partition), whereas the latter is not (all
> tasks need lookup()+open()+mkdir().... privileges)?
The former model is much more complex than the latter, and also has perfomance
issues.
Besides, the latter model seems to work for me (as long as I'm allowed to
change the security ID a process is acting as).
> > If so, that'd be nfs_readpage_to_fscache() as called from
> > nfs_readpage_sync() or nfs_readpage_release().
>
> nfs_readpage_release() is an rpciod context, _not_ a user thread
> context.
But it *is* process context...
I'm not sure what to do about that. I could maintain a record of every page
being serviced by FS-Cache, but that could be an awful lot of pages, and so
consume an awful lot of memory.
I'm hesitant about allocating memory to hold a list of pages in that context as
it might fail.
> > However, that may do other disk accesses, I suppose, as it calls
> > prepare_write() and commit_write() on ext3.
>
> Which would generally be forbidden under the rpciod context, BTW, since
> they imply calls to generic memory allocation (== nasty tricksy deadlock
> potential, since rpciod may be called upon to help write out NFS pages
> via shrink_page_list and friends).
I wish you'd mentioned that six months or more ago when I first waved these
patches in front of you.
> > vfs_mkdir() is synchronous. vfs_create() is synchronous. vfs_[sg]etxattr
> > is synchronous. Lookup is synchronous.
>
> All of them are synchronous as far as accessing the remote filesystem is
> concerned.
What has "accessing the remote filesystem" got to do with it?
> Why would the user process care if a privileged daemon has completed the
> shadow mkdir() or create() on the CacheFiles system or not?
Sigh.
The user process may not do a read() or an mmap() on a read-only file it has
just opened until we've found the data storage object in the cache associated
with that file, or it has created one.
However, several user processes may be accessing several different files on
serveral different mounts of maybe several different netfs's all at the same
time. This could mean that on behalf of each of these processes, CacheFiles
has to create serveral directories and a file.
But it may only have one thread in which to do this, and so each "cache lookup"
will wind up being serialised around the synchronous ext3 operations that
create files and directories and set xattrs. Consider:
SERVICE PROC A PROC B PROC C PROC D
======= ======= ======= ======= =======
open() open() open() open()
A:lookup()
A:mkdir()
A:setxattr()
A:mkdir()
A:setxattr()
A:mkdir()
A:setxattr()
A:mkdir()
A:setxattr()
A:mkdir()
A:setxattr()
A:create()
A:setxattr()
read()
B:lookup()
B:mkdir()
B:setxattr()
B:mkdir()
B:setxattr()
B:mkdir()
B:setxattr()
B:mkdir()
B:setxattr()
B:mkdir()
B:setxattr()
B:create()
B:setxattr()
read()
C:lookup()
C:mkdir()
C:setxattr()
C:mkdir()
C:setxattr()
C:mkdir()
C:setxattr()
C:mkdir()
C:setxattr()
C:mkdir()
C:setxattr()
C:create()
C:setxattr()
read()
D:lookup()
D:mkdir()
D:setxattr()
D:mkdir()
D:setxattr()
D:mkdir()
D:setxattr()
D:mkdir()
D:setxattr()
D:mkdir()
D:setxattr()
D:create()
D:setxattr()
read()
Which is what you're suggesting. Whereas what I have at the moment is this:
PROC A PROC B PROC C PROC D
=============== =============== =============== ===============
open() open() open() open()
A:lookup() B:lookup() C:lookup() D:lookup()
A:mkdir() B:mkdir() C:mkdir() D:mkdir()
A:setxattr() B:setxattr() C:setxattr() D:setxattr()
A:mkdir() B:mkdir() C:mkdir() D:mkdir()
A:setxattr() B:setxattr() C:setxattr() D:setxattr()
A:mkdir() B:mkdir() C:mkdir() D:mkdir()
A:setxattr() B:setxattr() C:setxattr() D:setxattr()
A:mkdir() B:mkdir() C:mkdir() D:mkdir()
A:setxattr() B:setxattr() C:setxattr() D:setxattr()
A:mkdir() B:mkdir() C:mkdir() D:mkdir()
A:setxattr() B:setxattr() C:setxattr() D:setxattr()
A:create() B:create() C:create() D:create()
A:setxattr() B:setxattr() C:setxattr() D:setxattr()
read() read() read() read()
Even if you replace the mkdir()'s and create()'s with lookup()'s and the
setxattr()'s with getxattr()'s, it doesn't make much difference: lookup() and
getxattr() are both reads and both synchronous.
> > Remember: in the common case, open(O_RDONLY) is going to be followed
> > quickly by a read(). I suppose there may be an intervening stat() and
> > malloc(), but even so...
>
> Which is why lookup() + open() + read() needs to be fast in the case
> where you have a CacheFiles hit. It does not justify mkdir, create, etc
> being fast, nor does it justify the open() + read() part needing to be
> fast in the case of a CacheFiles miss.
My point is that if userspace does open() + read() with no or little
intervening stuff, then asynchronicity is irrelevant as the whole process is
rendered effectively synchronous because the first read() has to wait on
completion.
And once an inode has been assayed for cache presence, subsequent open() calls
don't need to go to the cache as the state is retained in memory until the
inode is discarded.
David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists